全文搜索
我有一个80,000行的数据库,当我测试了一些FULLTEXT查询时,我遇到了意想不到的结果。 我已经从MYSQL中删除了停用词,并将最小字长设置为3。
当我做这个查询时:
SELECT `sentence`, MATCH (`sentence`) AGAINST ('CAN YOU FLY') AS `relevance`
FROM `sentences`
WHERE MATCH (`sentence`) AGAINST ('CAN YOU FLY')
ORDER BY `relevance` DESC
它给出了这个结果:
NO A FLY WITHOUT WINGS WOULD BE CALLED A WINGLESS | 10.623517036438
I CAN FLY | 7.61278629302979
I CAN FLY :) | 7.61278629302979
CAN YOU FLY? | 7.61278629302979
THEY CAN FLY | 7.61278629302979
YOU AM NOT FLY | 7.61278629302979
CAN YOU FLY | 7.61278629302979
HAVE YOU EVER SWALLOWED A FLY? | 7.52720737457275
I JUST WANNA FLY | 7.52720737457275
为什么“没有无翼无翼的飞翔”获得了最高的相关性,它只包含其中一个单词......另外,“CAN YOU FLY”怎么会不在顶部,这完全匹配。
我希望它通过最匹配的关键字排序,然后按顺序排列,然后按最少的顺序排列。 这会产生合乎逻辑的结果:
CAN YOU FLY
CAN YOU FLY?
I CAN FLY
THEY CAN FLY
I CAN FLY :)
YOU AM NOT FLY
HAVE YOU EVER SWALLOWED A FLY?
I JUST WANNA FLY
NO A FLY WITHOUT WINGS WOULD BE CALLED A WINGLESS
用于计算的公式可在MySQL Internals Manual中找到:
w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)
哪里
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
第一篇文章明显比其他文章内容多。 该公式很大程度上依赖于U
,即文档中唯一条款的数量。
通过您的评论,我会建议使用布尔全文搜索:
SELECT `sentence`, MATCH (`sentence`) AGAINST ('CAN YOU FLY' IN BOOLEAN MODE) AS `relevance`
FROM `sentences`
WHERE MATCH (`sentence`) AGAINST ('CAN YOU FLY' IN BOOLEAN MODE)
ORDER BY `relevance` DESC
链接地址: http://www.djcxy.com/p/75265.html
上一篇: full text search