How to match and highlight all terms in any order from an array of strings?

The requirements are:

  • Find strings from an array (from here on called options) that include ALL the terms in ANY order
  • Correctly highlight the matching terms - ie. insert a string before and after each matching term - I'm using <u> and </u> here
  • Both the search query and options can be "anything"
  • For the sake of simplicity answers can concentrate on case-insensitive search through a list including only ASCII characters and assume that the term delimiter is a plain space - ie. a query entered as "Foo bar baz" means the search terms are ['foo', 'bar', 'baz'] .

    To clarify:

  • All terms must exist separately from each other in matched options - ie. shorter terms should not exist only as substrings of longer terms and no two terms should overlap
  • Repeated search terms must exist in the option at least as many times as in the query
  • The final application is (perhaps not surprisingly) an autocomplete of sorts.

    TL;DR

    Most recent fiddle comparing the proposed algorithms side by side:
    https://jsfiddle.net/Mikk3lRo/ndeuqn02/7/
    (feel free to update this link if you add a new algorithm)

    jsPerf comparing algorithms in a somewhat more realistic / representative way - a few strings are basically "entered" one character at a time on each rep:
    https://jsperf.com/comparison-of-algorithms-to-search-and-highlight

    At this point it is clear (thanks to trincot's excellent base-comparison) that the majority of time used by the original implementations was spent on DOM-output. Its significance has been minimized as much as possible in the fiddle.

    There is still a clear difference in performance between the algorithms in each search, but not one of them is consistently fastest on every keystroke. After revisiting and cleaning up my own "Divide and Conquer" it does outperform the others consistently in any realistic scenario I try though.

    Tigregalis introduced the idea of a pre-run optimization, which seems reasonable given that options are unlikely to change between keystrokes. I have added (a function for) this to all methods here. The only algorithm where I saw an obvious benefit from it was in Skirtle's Permutations, but I'll encourage each answerer to consider if it might be useful for their own algorithms.

    Some algorithms will be much easier to adapt than others. It is still my opinion that this will be more important than the minor performance differences in a real implementation.

    Note that the current version of Tigregalis' Shrinking Set has a bug - I've excluded it from fiddle and jsperf until that is fixed.


    Viral Permutations

    In theory this can be solved by "manually" constructing a RegExp that contains every possible permutation of the search terms separated by a capturing group to catch anything between terms - a search for "foo bar" results in (foo)(.*?)(bar)|(bar)(.*?)(foo) .

    The highlighting is then done in one pass with string.replace() . If there is any change in the string we have a match.

    Demo:

    var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People's Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People's Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];
    var terms, terms_esc;
    function viral_permutations() {
        var t0, t1, i, permuted, res_elm, meta_elm, regex_string, regex, li, option, match_groups, highlighted;
        meta_elm = document.getElementById('viral_permutations_meta');
        res_elm = document.getElementById('viral_permutations_result');
        res_elm.innerHTML = meta_elm.innerHTML = '';
        
        t0 = performance.now();
        
        //Split query in terms at delimiter and lowercase them
        terms = document.getElementById('viral_permutations').value.split(/s/).filter(function(n) {
            return n;
        }).map(function(term){
            return term.toLowerCase();
        });
        meta_elm.innerHTML += 'Terms: ' + JSON.stringify(terms) + '<br>';
            
        //Escape terms
        terms_esc = terms.map(function(term) {
            return term.replace(/[-[]{}()*+?.,^$|#s]/g, "$&");
        });
        
        //Wrap terms in in individual capturing groups
        terms_esc = terms.map(function(term) {
            return '(' + term + ')';
        });
        
        //Find all permutations
        permuted = permutate_array(terms_esc);
        
        //Construct a group for each permutation
        match_groups = [];
        for (var i in permuted) {
            match_groups.push(permuted[i].join('(.*?)'));
        }
        
        try {
            //Construct the final regex
            regex_string = match_groups.join('|');
            //Display it
            document.getElementById('viral_permutations_regex').innerHTML = regex_string;
            meta_elm.innerHTML += 'RegExp length: ' + regex_string.length + '<br>';
            
            regex = new RegExp(regex_string, 'i');
        
    
            //Loop through each option
            for (i = 0; i < options.length; i++) {
                option = options[i];
    
                //Replace the terms with highlighted terms
                highlighted = option.replace(regex, viral_permutations_replace);
    
                //If anything was changed (or the query is empty) we have a match
                if (highlighted !== option || terms.length === 0) {
                    //Append it to the result list
                    li = document.createElement('li');
                    li.innerHTML = highlighted;
                    res_elm.appendChild(li);
                }
            }
    
            //Print some meta
            t1 = performance.now();
            meta_elm.innerHTML += 'Time: ' + (Math.round((t1 - t0) * 100) / 100) + 'ms';
        } catch(e) {
            meta_elm.innerHTML += '<span style="color:red">' + e.message + '</span>';
        }
    }
    
    //The replacement function
    function viral_permutations_replace() {
        var i, m, j, retval, m_cased, unmatched;
        retval = '';
        
        //Make a clone of the terms array (that we can modify without destroying the original)
        unmatched = terms.slice(0);
        
        //Loop arguments between the first (which is the full match) and
        //the last 2 (which are the offset and the full option)
        for (i = 1; i < arguments.length - 1; i++) {
            m = arguments[i];
            
            //Check that we have a string - most of the arguments will be undefined
            if (typeof m !== 'string') continue;
            
            //Lowercase the match
            m_cased = m.toLowerCase();
            
            //Append it to the return value - highlighted if it is among our terms
            j = unmatched.indexOf(m_cased);
            if (j >= 0) {
                //Remove it from the unmatched terms array
                unmatched.splice(j, 1);
                retval += '<u>' + m + '</u>';
            } else {
                retval += m;
            }
        }
        return retval;
    }
    
    //Helper function to return all possible permutations of an array
    function permutate_array(arr) {
        var perm, perm_intern;
        perm_intern = function(perm, pre, post, n) {
            var elem, i, j, ref, rest;
            if (n > 0) {
                for (i = j = 0, ref = post.length; 0 <= ref ? j < ref : j > ref; i = 0 <= ref ? ++j : --j) {
                    rest = post.slice(0);
                    elem = rest.splice(i, 1);
                    perm_intern(perm, pre.concat(elem), rest, n - 1);
                }
            } else {
                perm.push(pre);
            }
        };
        perm = [];
        perm_intern(perm, [], arr, arr.length);
        return perm;
    }
    viral_permutations();
    <input type="text" id="viral_permutations" onkeyup="viral_permutations()">
    <p id="viral_permutations_meta"></p>
    <pre style="width:100%;overflow:auto;background:#eee" id="viral_permutations_regex"></pre>
    <ul style="height:300px;overflow:auto" id="viral_permutations_result"></ul>

    I gave it a go but I'm not sure how much help this will be. My approach is similar to your Divide and Conquer technique.

    Instead of biting off bits of the string I search for each term ahead of time and store up a collection of all the matches, recording the start and finish positions. If there aren't enough matches for a particular search term the algorithm immediately bails for that 'option'.

    Once it has gathered together all of the possible matches it recursively attempts to find a combination that doesn't overlap. There's a lot of copying of data structures going on in that recursion and I suspect it could be optimized a lot better than I have it here. I can also only apologize for some of the variable names, I was struggling to come up with names that made any sense at all.

    For certain test searches, such as anananan ... it seems to cope better than the original Divide and Conquer technique but I suspect that may be just because of the early bail-out that is performed when insufficient matches are found for a particular search term. Without a large quantity of real data it's difficult to know where the really valuable optimizations would be.

    function search() {
        var options = [
            'ababababa',
            'United States',
            'United Kingdom',
            'Afghanistan',
            'Aland Islands',
            'Albania',
            'Algeria',
            'American Samoa',
            'Andorra',
            'Angola',
            'Anguilla',
            'Antarctica',
            'Antigua and Barbuda',
            'Argentina',
            'Armenia',
            'Aruba',
            'Australia',
            'Austria',
            'Azerbaijan',
            'Bahamas',
            'Bahrain',
            'Bangladesh',
            'Barbados',
            'Belarus',
            'Belgium',
            'Belize',
            'Benin',
            'Bermuda',
            'Bhutan',
            'Bolivia, Plurinational State of',
            'Bonaire, Sint Eustatius and Saba',
            'Bosnia and Herzegovina',
            'Botswana',
            'Bouvet Island',
            'Brazil',
            'British Indian Ocean Territory',
            'Brunei Darussalam',
            'Bulgaria',
            'Burkina Faso',
            'Burundi',
            'Cambodia',
            'Cameroon',
            'Canada',
            'Cape Verde',
            'Cayman Islands',
            'Central African Republic',
            'Chad',
            'Chile',
            'China',
            'Christmas Island',
            'Cocos (Keeling) Islands',
            'Colombia',
            'Comoros',
            'Congo',
            'Congo, The Democratic Republic of The',
            'Cook Islands',
            'Costa Rica',
            'Cote D'ivoire',
            'Croatia',
            'Cuba',
            'Curacao',
            'Cyprus',
            'Czech Republic',
            'Denmark',
            'Djibouti',
            'Dominica',
            'Dominican Republic',
            'Ecuador',
            'Egypt',
            'El Salvador',
            'Equatorial Guinea',
            'Eritrea',
            'Estonia',
            'Ethiopia',
            'Falkland Islands (Malvinas)',
            'Faroe Islands',
            'Fiji',
            'Finland',
            'France',
            'French Guiana',
            'French Polynesia',
            'French Southern Territories',
            'Gabon',
            'Gambia',
            'Georgia',
            'Germany',
            'Ghana',
            'Gibraltar',
            'Greece',
            'Greenland',
            'Grenada',
            'Guadeloupe',
            'Guam',
            'Guatemala',
            'Guernsey',
            'Guinea',
            'Guinea-bissau',
            'Guyana',
            'Haiti',
            'Heard Island and Mcdonald Islands',
            'Holy See (Vatican City State)',
            'Honduras',
            'Hong Kong',
            'Hungary',
            'Iceland',
            'India',
            'Indonesia',
            'Iran, Islamic Republic of',
            'Iraq',
            'Ireland',
            'Isle of Man',
            'Israel',
            'Italy',
            'Jamaica',
            'Japan',
            'Jersey',
            'Jordan',
            'Kazakhstan',
            'Kenya',
            'Kiribati',
            'Korea, Democratic People's Republic of',
            'Korea, Republic of',
            'Kuwait',
            'Kyrgyzstan',
            'Lao People's Democratic Republic',
            'Latvia',
            'Lebanon',
            'Lesotho',
            'Liberia',
            'Libya',
            'Liechtenstein',
            'Lithuania',
            'Luxembourg',
            'Macao',
            'Macedonia, The Former Yugoslav Republic of',
            'Madagascar',
            'Malawi',
            'Malaysia',
            'Maldives',
            'Mali',
            'Malta',
            'Marshall Islands',
            'Martinique',
            'Mauritania',
            'Mauritius',
            'Mayotte',
            'Mexico',
            'Micronesia, Federated States of',
            'Moldova, Republic of',
            'Monaco',
            'Mongolia',
            'Montenegro',
            'Montserrat',
            'Morocco',
            'Mozambique',
            'Myanmar',
            'Namibia',
            'Nauru',
            'Nepal',
            'Netherlands',
            'New Caledonia',
            'New Zealand',
            'Nicaragua',
            'Niger',
            'Nigeria',
            'Niue',
            'Norfolk Island',
            'Northern Mariana Islands',
            'Norway',
            'Oman',
            'Pakistan',
            'Palau',
            'Palestinian Territory, Occupied',
            'Panama',
            'Papua New Guinea',
            'Paraguay',
            'Peru',
            'Philippines',
            'Pitcairn',
            'Poland',
            'Portugal',
            'Puerto Rico',
            'Qatar',
            'Reunion',
            'Romania',
            'Russian Federation',
            'Rwanda',
            'Saint Barthelemy',
            'Saint Helena, Ascension and Tristan da Cunha',
            'Saint Kitts and Nevis',
            'Saint Lucia',
            'Saint Martin (French part)',
            'Saint Pierre and Miquelon',
            'Saint Vincent and The Grenadines',
            'Samoa',
            'San Marino',
            'Sao Tome and Principe',
            'Saudi Arabia',
            'Senegal',
            'Serbia',
            'Seychelles',
            'Sierra Leone',
            'Singapore',
            'Sint Maarten (Dutch part)',
            'Slovakia',
            'Slovenia',
            'Solomon Islands',
            'Somalia',
            'South Africa',
            'South Georgia and The South Sandwich Islands',
            'South Sudan',
            'Spain',
            'Sri Lanka',
            'Sudan',
            'Suriname',
            'Svalbard and Jan Mayen',
            'Swaziland',
            'Sweden',
            'Switzerland',
            'Syrian Arab Republic',
            'Taiwan, Province of China',
            'Tajikistan',
            'Tanzania, United Republic of',
            'Thailand',
            'Timor-leste',
            'Togo',
            'Tokelau',
            'Tonga',
            'Trinidad and Tobago',
            'Tunisia',
            'Turkey',
            'Turkmenistan',
            'Turks and Caicos Islands',
            'Tuvalu',
            'Uganda',
            'Ukraine',
            'United Arab Emirates',
            'United Kingdom',
            'United States',
            'United States Minor Outlying Islands',
            'Uruguay',
            'Uzbekistan',
            'Vanuatu',
            'Venezuela, Bolivarian Republic of',
            'Viet Nam',
            'Virgin Islands, British',
            'Virgin Islands, U.S.',
            'Wallis and Futuna',
            'Western Sahara',
            'Yemen',
            'Zambia',
            'Zimbabwe'
        ];
    
        var terms = document.getElementById('search').value.trim().toLowerCase().split(/s+/);
    
        if (!terms[0]) {
            terms = [];
        }
    
        document.getElementById('terms').innerText = 'Terms: ' + JSON.stringify(terms);
    
        var startTime = performance.now();
    
        // Term counts is a map storing how many times each search term appears in the query
        var termCounts = {};
    
        terms.forEach(function(term) {
            termCounts[term] = (termCounts[term] || 0) + 1;
        });
    
        // An array of search terms with the duplicates removed
        var uniqueTerms = Object.keys(termCounts);
    
        // Loop through each option and map to either a highlight version or null
        options = options.map(function(optionText) {
            var matches = {},
                lastMatchIndex = {},
                option = optionText.toLowerCase();
    
            uniqueTerms.forEach(function(term) {
                // This array will be populated with start/end position of each match for this term
                matches[term] = [];
    
                // The index of the last match... which could be deduced from the matches but this is slightly easier
                lastMatchIndex[term] = -1;
            });
    
            var incompleteMatchTerms = uniqueTerms.slice(),
                nextMatchTerm;
    
            // This is probably a premature optimization but doing it this
            // way ensures we check that each search term occurs at least
            // once as quickly as possible.
            while (nextMatchTerm = incompleteMatchTerms.shift()) {
                var nextMatchIndex = option.indexOf(nextMatchTerm, lastMatchIndex[nextMatchTerm] + 1);
    
                if (nextMatchIndex === -1) {
                    // Haven't found enough matches for this term, so the option doesn't match
                    if (termCounts[nextMatchTerm] > matches[nextMatchTerm].length) {
                        return null;
                    }
                }
                else {
                    // Found another match, put the term back on the queue
                    // for another round
                    incompleteMatchTerms.push(nextMatchTerm);
                    lastMatchIndex[nextMatchTerm] = nextMatchIndex;
    
                    matches[nextMatchTerm].push({
                        start: nextMatchIndex,
                        end: nextMatchIndex + nextMatchTerm.length
                    });
                }
            }
    
            // Pass in the original array of terms... we attempt to highlight in the order of the original query
            var highlights = performHighlight(terms, matches);
    
            if (!highlights) {
                return null;
            }
    
            // We need the highlights sorted so that we do the replacing from the end of the string
            highlights.sort(function(h1, h2) {
                return h2.start - h1.start;
            });
    
            highlights.forEach(function(highlight) {
                optionText = optionText.slice(0, highlight.start)
                        + '<u>' + optionText.slice(highlight.start, highlight.end) + '</u>'
                        + optionText.slice(highlight.end);
            });
    
            return optionText;
    
            function performHighlight(terms, allMatches) {
                // If there are no terms left to match we've got a hit
                if (terms.length === 0) {
                    return [];
                }
    
                var nextTerms = terms.slice(),
                    term = nextTerms.shift(),
                    matches = allMatches[term].slice(),
                    match;
    
                while (match = matches.shift()) {
                    var nextMatches = {};
    
                    // We need to purge any entries from nextMatches that overlap the current match
                    uniqueTerms.forEach(function(nextTerm) {
                        var nextMatch = term === nextTerm ? matches : allMatches[nextTerm];
    
                        nextMatches[nextTerm] = nextMatch.filter(function(match2) {
                            return match.start >= match2.end || match.end <= match2.start;
                        });
                    });
    
                    var highlights = performHighlight(nextTerms, nextMatches);
    
                    if (highlights) {
                        highlights.push(match);
    
                        return highlights;
                    }
                }
    
                return null;
            }
        });
    
        document.getElementById('results').innerHTML = options.map(function(option) {
            if (option) {
                return '<li>' + option + '</li>';
            }
    
            return '';
        }).join('');
    
        document.getElementById('time').innerText = Math.round((performance.now() - startTime) * 100) / 100 + 'ms';
    }
    <h1>Permutations</h1>
    <input type="text" id="search" onkeyup="search()" autocomplete="off">
    <p id="terms"></p>
    <p id="time"></p>
    <ul id="results"></ul>

    I would suggest a slight variant on the divide and conquer idea: instead of splitting the string into chunks (bites), you could "wipe out" the characters that have been matched, and perform further searches on that one string. The character to wipe out with would be the separator, as it is guaranteed to not occur in any of the terms.

    Here it is:

    function trincotWipeSearch(query, options, separator) {
        // Split query in terms at delimiter
        const terms = query.split(separator).filter(Boolean);
        if (!terms.length) return options;
        // Sort terms by descending size
        terms.sort( (a,b) => b.length - a.length );
    
        // Escape terms, and enrich with size of original term
        // and a string of the same length filled with the separator char
        const items = terms.map(term => ({
            size: term.length,
            wipe: separator.repeat(term.length), 
            regex: new RegExp(term.replace(/[-[]{}()*+?.,^$|#s]/g, "$&"), 'gi')
        }));
    
        function getOffsets(termIndex, text) {
            // All terms found?
            if (termIndex >= terms.length) return [];
            let match;
            const { regex, size, wipe } = items[termIndex];
            regex.lastIndex = 0;
            while (match = regex.exec(text)) {
                let index = match.index;
                // Wipe characters and recurse to find other terms
                let offsets = getOffsets(termIndex+1, 
                    text.substr(0, index) + wipe + text.substr(index + size));
                if (offsets !== undefined) { 
                    // Solution found, backtrack all the way
                    return offsets.concat([index, index + size]);
                }
                regex.lastIndex = match.index + 1;
            }
        }
    
        // Loop through each option
        return options.map( option => {
            // Get the offsets of the matches
            let offsets = getOffsets(0, option);
            if (offsets) {
                // Apply the offsets to add the markup
                offsets
                    .sort( (a,b) => b - a )
                    .map((index, i) => {
                        option = option.substr(0, index) 
                            + (i%2 ? "<u>" : "</u>")
                            + option.substr(index);
                    });
                return option;
            }
        }).filter(Boolean); // get only the non-empty results
    }
    
    var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People's Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People's Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];
    
    /*
     * I/O and performance measurements 
     */
    
    function processInput() {
        var query = this.value.toLowerCase();
        const t0 = performance.now();
        const matches = trincotWipeSearch(query, options, ' ');
        const spentTime = performance.now() - t0;
        // Output the time spent
        time.textContent = spentTime.toFixed(2);
        // Output the matches
        result.innerHTML = '';
        for (var match of matches) {
            // Append it to the result list
            var li = document.createElement('li');
            li.innerHTML = match;
            result.appendChild(li);
        }
    }
    
    findTerms.addEventListener('keyup', processInput);
    processInput.call(findTerms);
    ul { 
        height:300px;
        font-size: smaller;
        overflow: auto;
    }
    Input terms: <input type="text" id="findTerms"><br>
    
    <h3>Trincot's Wipe Search</h3>
    Time: <span id="time"></span>ms<br>
    <ul id="result"></ul>

    Divide and Conquer

    Somewhat more complicated than the single-regex Viral Permutations strategy - this recursive algorithm searches for each term one after the other, starting with the longest term.

    Each time a match is found it divides that "bite" into three (unless at the beginning or end), marking the matched "bite" as consumed, and attempts to match the next-longest term in any of the unconsumed "bites".

    When it is unable to find the longest unmatched term, it will backtrack and attempt to match the previous term at a different position (even in a different "bite").

    If it comes back to the longest term and cannot find another position to match it at, it will return false.

    This means that in most cases it can return non-matches pretty quickly, simply because they don't even contain the longest term.

    Of course if it runs out of terms - ie. successfully matches the shortest - it will return the highlighted match by joining all the "bites" back together.

    Demo:

    Updated for improved performance : The base algorithm is exactly the same, but there were some pretty expensive calls to arr.slice() that could be avoided completely.

    function divide_and_conquer_replace(query, options, separator) {
        var terms, terms_esc;
    
        //The inner replacement function
        function divide_and_conquer_inner(bites, depth) {
            var this_term, i, bite, match, new_bites, found_all_others;
    
            depth = depth ? depth : 1;
    
            //Get the longest remaining term
            this_term = terms_esc[terms_esc.length - depth];
    
            //Loop all the bites
            for (i = 0; i < bites.length; i++) {
                bite = bites[i];
    
                //Reset the lastIndex since we're reusing the RegExp objects
                this_term.lastIndex = 0;
    
                //Check that we have a string (ie. do not attempt to match bites
                //that are already consumed)
                if (typeof bite === 'string') {
                    //Find the next matching position (if any)
                    while (match = this_term.exec(bite)) {
                        new_bites = (i > 0) ? bites.slice(0, i) : [];
                        if (match.index > 0) {
                            new_bites.push(bite.slice(0, match.index));
                        }
                        new_bites.push(['<u>' + match[0] + '</u>']);
                        if (this_term.lastIndex < bite.length) {
                            new_bites.push(bite.slice(this_term.lastIndex));
                        }
                        if (i < bites.length - 1) {
                            new_bites = new_bites.concat(bites.slice(i + 1));
                        }
    
                        if (terms_esc.length > depth) {
                            //Attempt to find all other terms
                            found_all_others = divide_and_conquer_inner(new_bites, depth + 1);
    
                            //If we found all terms we'll pass the modified string all the
                            //way up to the original callee
                            if (found_all_others) {
                                return found_all_others;
                            }
                            //Otherwise try to match current term somewhere else
                            this_term.lastIndex = match.index + 1;
                        } else {
                            //If no terms remain we have a match
                            return new_bites.join('');
                        }
                    }
                }
            }
            //If we reach this point at least one term was not found
            return null;
        };
    
        // Split query in terms at delimiter
        terms = query.split(separator).filter(Boolean);
        if (!terms.length) return options;
    
    
        //Sort terms according to length - longest term last
        terms.sort(function(a, b) {
            return a.length - b.length;
        });
    
        //Escape terms
        //And store RegExp's instead of strings
        terms_esc = terms.map(function (term) {
            return term.replace(/[-[]{}()*+?.,^$|#s]/g, "$&");
        }).map(function (term) {
            return new RegExp(term, 'gi');
        });
    
        //Loop through each option
        return options.map(function(option){
            return divide_and_conquer_inner([option]);
        }).filter(Boolean);
    }
    
    var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People's Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People's Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];
    var separator = ' ';
    
    function divide_and_conquer(){
        var query = document.getElementById('divide_and_conquer').value;
        var res_elm = document.getElementById('divide_and_conquer_result');
        var t0 = performance.now();
        var results = divide_and_conquer_replace(query, options, separator);
        var t1 = performance.now();
        document.getElementById('divide_and_conquer_meta').innerHTML = 'Time: ' + (t1 - t0).toFixed(2) + 'ms';
        res_elm.innerHTML = '';
        for (var result of results) {
            res_elm.innerHTML += '<li>' + result + '</li>';
        }
    };
    divide_and_conquer();
    <input type="text" id="divide_and_conquer" onkeyup="divide_and_conquer()">
    <p id="divide_and_conquer_meta"></p>
    <ul style="height:300px;overflow:auto" id="divide_and_conquer_result"></ul>
    链接地址: http://www.djcxy.com/p/40088.html

    上一篇: 如何在iPython笔记本中调用使用argparse编写的模块

    下一篇: 如何从字符串数组中按任何顺序匹配和突出显示所有术语?