Is full text search suitable for searching people's names?

I have a user table with a single column for a person's name:

CREATE TABLE [dbo].[Users]
(
    Id bigint NOT NULL,
    Name nvarchar(80) NOT NULL,
    PRIMARY KEY CLUSTERED (Id ASC)
)

The Name column can contain either a full name or just the first name or anything really (separated by spaces). In order to implement a search on Name , I would like to utilize SQL's full-text search, but not sure if it's suitable for searching names/nicknames and not actual words. Also the question is - which language do I choose when creating the FT index on Name ?

Any other considerations?

Thank you.


At first sight, I would recommend using the LIKE operator rather than a full-text query.

Make sure you search case insensitive and probably accent insensitive. This can be achieved by the setting the correct collation on either the server, database, the table column or in your query. In the query, this is done by something like:

SELECT *
FROM [dbo].[Users]
WHERE Name LIKE '%niaher%' COLLATE SQL_Latin1_General_CP1_CI_AI

If you use a full-text index, you get all sorts of features, like verb stemming and thesaurus, see Linguistic Components and Language Support in Full-Text Search, which you do not need, when searching in a list of names. By the way, these features are language dependent and that's why you specify a language on the full-text index.

Application of a stoplist you might even want to avoid. At least I would, since in Dutch many surnames start with articles and/or prepositions: "Rembrandt van Rijn". "van" would be in a Dutch stoplist for sure and prevent any match on a search term that contains "van".

If you run into performance issues, it might be useful to try a full-text index and search using CONTAINS with a simple term.

SELECT *
FROM [dbo].[Users]
WHERE CONTAINS(Name, 'niaher')

Please note that full-text indexes are updated asynchronously.


It seems that if you want to search multi-part names, full-text search is the easiest and most appropriate approach (please correct me if I'm wrong). The other alternative being LIKE '%query%' , however it has too many disadvantages:

  • Terrible performance, since it does index scan
  • Order of terms matters, eg - searching for "John Smith" and "Smith John" will return different results.
  • It disregards word boundaries, eg - searching for "Ann" will also retrieve "Joanna" and "Danny", which aren't useful matches.
  • So I went ahead and implemented a full-text search. My queries look something like this:

    SELECT * FROM Users WHERE CONTAINS(Name, '"John*"')
    

    The only slight difficulty is that I had to convert user query (John) into a CONTAINS-friendly query ("John*"). To do that, I implemented this method in my UserRepository:

    /// <summary>
    /// Converts user-entered search query into a query that can be consumed by CONTAINS keyword of SQL Server.
    /// </summary>
    /// <example>If query is "John S Ju", the result will be ""John*" AND "S*" AND "Ju*"".</example>
    /// <param name="query">Query entered by user.</param>
    /// <returns>String instance.</returns>
    public static string GetContainsQuery(string query)
    {
        string containsQuery = string.Empty;
    
        var terms = query.Split(new[] { ' ' }, StringSplitOptions.None);
    
        if (terms.Length > 1)
        {
            for (int i = 0; i < terms.Length; i++)
            {
                string term = terms[i].Trim();
    
                // Add wildcard term, e.g. - "term*". The reason to add wildcard is because we want
                // to allow search by partially entered name parts (partially entered first name and/or
                // partially entered last name, etc).
                containsQuery += """ + term + "*"";
    
                // If it's not the last term.
                if (i < terms.Length - 1)
                {
                    // We want all terms inside user query to match.
                    containsQuery += " AND ";
                }
            }
    
            containsQuery = containsQuery.Trim();
        }
        else
        {
            containsQuery = """ + query + "*"";
        }
    
        return containsQuery;
    }
    

    Hope this helps anyone stumbling into the same issue.

    PS - I wrote a blogpost documenting this.

    链接地址: http://www.djcxy.com/p/11466.html

    上一篇: 截取获取的结果控制器中断搜索显示

    下一篇: 全文搜索适合搜索人名吗?