Text search in multilingual content in SQL Server
We have a site which supports different languages. We have millions of data so in search we would like to implement SQL Server Full-Text Search .
The table structure we have currently like below.
CREATE TABLE Product
(
ID INT IDENTITY(1,1),
Code VARCHAR(50),
........
........
)
CREATE TABLE ProductLanguage
(
ID INT,
LanguageID INT,
Name NVARCHAR(200),
........
........
)
We would like to implement Full-Text search in "Name" column so we have created Full-Text index on the Name column. But while creating Full-Text index we can select only one language per column. If we select "English" or "Neutral" its not returning expected data in other languages like Japanese, Chinese, French etc.
So what is the best way to implement Full-Text search in SQL Server for multilingual content.
Do we need to create a different table. If yes then what will be the table structure (We need to keep in mind that the Languages are not fixed, different language can be added later) and what will be search query?
We are using SQL Server 2008 R2.
Certain content (document) types support language settings - eg Microsoft Office Documents, PDF, [X]HTML, or XML.
If you change the type of your Name column to XML, you can determine the language of each value (ie per row). For instance:
Instead of storing values as strings
name 1
name 2
name 3
...you could store them as XML documents with the appropriate language declarations:
<content xml:lang="en-US">name 1</content>
<content xml:lang="fr-FR">name 2</content>
<content xml:lang="en-UK">name 3</content>
During Full-text index population the correct word breaker/stemmer will be used, based on the language settings of each value (XML document): US English for name 1, French or name 2, and UK English for name 3.
Of course, this would require a significant change in the way your data is managed and consumed.
ML
I'd be concerned about the performance of using XML instead of NVARCHAR(n) - though I have no hard proof for it. One alternative could be to use dynamic SQL (generate the language specific code on the fly), combined with language specific indexed views on the Product table. Drawback of thsi is the lack of execution plan caching, ie again: performance.
Same idea as Matija Lah's answer, but this is the suggested solution outlined in the MS whitepaper.
<meta name=“MS.locale”…>
xml:lang
attribute https://technet.microsoft.com/en-us/library/cc721269%28v=sql.100%29.aspx
链接地址: http://www.djcxy.com/p/66852.html上一篇: 使列与Full兼容