Schema for a multilanguage database

database-designlocalizationmultilingual

I'm developing a multilanguage software. As far as the application code goes, localizability is not an issue. We can use language specific resources and have all kinds of tools that work well with them.

But what is the best approach in defining a multilanguage database schema? Let's say we have a lot of tables (100 or more), and each table can have multiple columns that can be localized (most of nvarchar columns should be localizable). For instance one of the tables might hold product information:

CREATE TABLE T_PRODUCT (
  NAME        NVARCHAR(50),
  DESCRIPTION NTEXT,
  PRICE       NUMBER(18, 2)
)

I can think of three approaches to support multilingual text in NAME and DESCRIPTION columns:

  1. Separate column for each language

    When we add a new language to the system, we must create additional columns to store the translated text, like this:

    CREATE TABLE T_PRODUCT (
      NAME_EN        NVARCHAR(50),
      NAME_DE        NVARCHAR(50),
      NAME_SP        NVARCHAR(50),
      DESCRIPTION_EN NTEXT,
      DESCRIPTION_DE NTEXT,
      DESCRIPTION_SP NTEXT,
      PRICE          NUMBER(18,2)
    )
    
  2. Translation table with columns for each language

    Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains a column for each language.

    CREATE TABLE T_PRODUCT (
      NAME_FK        int,
      DESCRIPTION_FK int,
      PRICE          NUMBER(18, 2)
    )
    
    CREATE TABLE T_TRANSLATION (
      TRANSLATION_ID,
      TEXT_EN NTEXT,
      TEXT_DE NTEXT,
      TEXT_SP NTEXT
    )
    
  3. Translation tables with rows for each language

    Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains only a key, and a separate table contains a row for each translation to a language.

    CREATE TABLE T_PRODUCT (
      NAME_FK        int,
      DESCRIPTION_FK int,
      PRICE          NUMBER(18, 2)
    )
    
    CREATE TABLE T_TRANSLATION (
      TRANSLATION_ID
    )
    
    CREATE TABLE T_TRANSLATION_ENTRY (
      TRANSLATION_FK,
      LANGUAGE_FK,
      TRANSLATED_TEXT NTEXT
    )
    
    CREATE TABLE T_TRANSLATION_LANGUAGE (
      LANGUAGE_ID,
      LANGUAGE_CODE CHAR(2)
    )
    

There are pros and cons to each solution, and I would like to know what are your experiences with these approaches, what do you recommend and how would you go about designing a multilanguage database schema.

Best Solution

What do you think about having a related translation table for each translatable table?

CREATE TABLE T_PRODUCT (pr_id int, PRICE NUMBER(18, 2))

CREATE TABLE T_PRODUCT_tr (pr_id INT FK, languagecode varchar, pr_name text, pr_descr text)

This way if you have multiple translatable column it would only require a single join to get it + since you are not autogenerating a translationid it may be easier to import items together with their related translations.

The negative side of this is that if you have a complex language fallback mechanism you may need to implement that for each translation table - if you are relying on some stored procedure to do that. If you do that from the app this will probably not be a problem.

Let me know what you think - I am also about to make a decision on this for our next application. So far we have used your 3rd type.

Related Question