Implementation of Levenshtein distance for mysql/fuzzy search?
I would like to be able to search a table as follows for smith as get everything that it within 1 variance.
Data:
O'Brien Smithe Dolan Smuth Wong Smoth Gunther Smiht
I have looked into using Levenshtein distance does anyone know how to implement this with it?
In order to efficiently search using levenshtein distance, you need an efficient, specialised index, such as a bk-tree. Unfortunately, no database system I know of, including MySQL, implements bk-tree indexes. This is further complicated if you're looking for full-text search, instead of just a single term per row. Off-hand, I can't think of any way that you could do full-text indexing in a manner that allows for searching based on levenshtein distance.
An implementation for the damerau-levenshtein distance can be found here: Damerau-Levenshtein algorithm: Levenshtein with transpositions The improvement over pure Levenshtein distance is that the swapping of characters is considered. I found it in the comments of schnaader's link, thanks!
There is a mysql UDF implementation of Levenshtein Distance function
https://github.com/jmcejuela/Levenshtein-MySQL-UDF
It is implemented in C and has better performance than the "MySQL Levenshtein distance query" mentioned by schnaader
链接地址: http://www.djcxy.com/p/75300.html