How to model multilingual entities in relational databases

If we are going to develop a multilingual application, shall we store translations in resource files or the database ?

Suppose we choose to do it in the database. Is there a standard way to model multilingual entities in the Relational Model ?

1. One Big Translation Table

We can store all the translations in one table and use language-neutral keys for the attribute values.

Person ( SSN , FirstName, LastName, Birthday)

Translation ( key , langid , translation)

2. One Translation Table For Each Entity

Person ( SSN , Birthday)

PersonML ( SSN , LangId , FirstName, LastName)

I prefer this approach. It is really a 1:N relationship .

Problem

It seems multilingual columns cannot be used to form a Primary Key .
Let's assume every person has a unique name, then (FirstName, LastName) can be used as the primary key.

Person ( FirstName , LastName , Birthday)

However, when taking multilingual into account, (FirstName, LastName) cannot identify a person.
Apparently we can't add LangId to form a primary key.

Person ( LangId , FirstName , LastName , Birthday)

In this case, one person would be stored in multiple rows and the non-key columns would be duplicated.

Do we have to use language-neutral columns for Primary Keys?
When there are no such columns, shall we use a surrogate ?
I have been told that surrogates should not be used blindly and I strongly agree.


Update 1

In the example, I assume FirstName and LastName are subject to localization.

If there is always some attribute like SSN for every entity, the second approach makes more sense.
However, some valid Primary Keys may become invalid if they contain columns that are subject to localization.

Another Example

Every company has a unique name, so CompanyName can be used as the Primary Key.

Company ( CompanyName , ...)

When it comes to localization, company name cannot be used as a primary key. We have to invent some code to represent the company.

Does it mean localization doesn't fit in the Relational Model?


Update 2

3. 1:N Relationship Between Default Language and Other Languages

Users may perceive the company table as:

Company ( CompanyNameEnglish , CompanyNameFrench, CompanyNameSpanish, ...)

Of course there are repeating groups, so it breaks 1NF.

Improved:

Company ( CompanyNameEnglish , ...)

CompanyNameML ( CompanyNameEnglish , LangId , CompanyName)

The problem is we have to provide the a default (English) Name even if it is not required by the user.
Some users may provide English names, others may provide French names ONLY.
Is this requirement too contrived?

4. DBMS Localization Support

PerformanceDBA brought this up in his comment.
I will do more research on it.


"I have been told that surrogates should not be used blindly and I strongly agree."
I also agree with that, using anything blindly is never a smart choice.

However, not every time you use surrogate key it is done blindly. Keep in mind that a primary key is not the only way to ensure uniqueness. Most if not all relational databases offers unique constraints and unique indexes, and it should be used wisely. In fact, when storing multilingual data in translation tables, using a surrogate key might be better then using a natural one. read this article for a good comparison between natural and surrogate key strategies.

To answer your question, I would go with a translation table for each entity, keeping only the entity non-textual data in the main entity table (such as birth date and gender in your person's example), and keeping the textual data in the translation table, having it's primary key composed of the language id and the entity table primary key.
Note that the primary key of the entity table in this case must be non-textual, and not language-depended.


Your question is moot : if you decide to store your translations in "resource files", then that set of resource files IS the database (or part of it).

The more relevant questions to answer are, eg : who is the owner of the translations (ie are the translations integral part of the software package or not - can end users customize them) ? The answers to questions such as those will drive whether your translations can go as resource files shipped with the binaries or not.

I'm not giving any real answer, because the only one who can know the decisive factors is you. I'm just pointing out the deeper kind of questions to be answered which will make it clear.

链接地址: http://www.djcxy.com/p/37446.html

上一篇: 实体框架数据库第一多

下一篇: 如何在关系数据库中为多语言实体建模