Comparing Relevance Scores From mySQL full text searches from different tables

Scenario :

I have 5 tables all which need to be searched. I have proper full text indexes (indices?) for each. I can search each individually using MATCH and AGAINST, and ordering by their relevance scores.

The problem is I want to combine and interweave the search results of all 5 tables and base it off of relevance score. Like so:

(SELECT *, MATCH(column) AGAINST (query) as score
FROM table1
WHERE MATCH (column) AGAINST (query))
UNION
(SELECT *, MATCH(column) AGAINST (query) as score
FROM table2
WHERE MATCH (column) AGAINST (query))
UNION
...
ORDER BY score DESC

This works well and dandy except that table 1 may have twice as many rows as table 2. Thus, since mySQL takes into account uniqueness for relevance, the score for results of table 1 are most often significantly higher the results of table 2.

Ultimately : How can I normalize the scores for the results from the 5 tables of varying size if I want to weight results from each table equally?


Your UNION 'ing of the results from the five tables makes me believe you probably should merge the five tables into a single one (with perhaps an additional column that identifies the one of five types of data, currently spread in five tables).

Similarly, you could store just the text column in one single table, like this one :

CREATE TABLE text_table (
    text_col TEXT,
    fk INT, -- references the PK of an item in either table1, or table2, or...
    ref_table INT, -- identifies the related table, e.g. 1 means 'table1', etc.
    FULLTEXT INDEX (text_col)
)

Then you could run the full-text seach on this table. JOIN 'ing the results with the actual data tables seems to be straightforward.


As a note:

The suggestions above by YaK are likely the best options for most scenarios asking this question. The route I actually took was to record the average highest relevance score for each of the 5 tables. I then would divide al future relevance scores by this factor in an attempt to 'normalize' the scores so that they could be compared to the relevance scores from the other tables. Thus far it has worked well, but not perfectly (particularly large queries).

链接地址: http://www.djcxy.com/p/75282.html

上一篇: Mysql查询标记搜索与相关性

下一篇: 比较关联分数来自不同表格的mySQL全文搜索