Php – Implementing internationalization (language strings) in a PHP application

globalizationicuinternationalizationPHP

I want to build a CMS that can handle fetching locale strings to support internationalization. I plan on storing the strings in a database, and then placing a key/value cache like memcache in between the database and the application to prevent performance drops for hitting the database each page for a translation.

This is more complex than using PHP files with arrays of strings – but that method is incredibly inefficient when you have 2,000 translation lines.

I thought about using gettext, but I'm not sure that users of the CMS will be comfortable working with the gettext files. If the strings are stored in a database, then a nice administration system can be setup to allow them to make changes whenever they want and the caching in RAM will insure that the fetching of those strings is as fast, or faster than gettext. I also don't feel safe using the PHP extension considering not even the zend framework uses it.

Is there anything wrong with this approach?

Update

I thought perhaps I would add more food for thought. One of the problems with string translations it is that they doesn't support dates, money, or conditional statements. However, thanks to intl PHP now has MessageFormatter which is what really needs to be used anyway.

// Load string from gettext file
$string = _("{0} resulted in {1,choice,0#no errors|1#single error|1<{1, number} errors}");

// Format using the current locale
msgfmt_format_message(setlocale(LC_ALL, 0), $string, array('Update', 3));

On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.

So, it makes sense to use keys like _('error.404_not_found') which then allow the content writers and translators to just worry about the PO/MO files without messing in the code.

However, in the event that a gettext translation doesn't exist for the given key then there is no way to fall back to a default (like you could with a custom handler). This means that you either have the writter mucking around in your code – or have "error.404_not_found" shown to users that don't have a locale translation!

In addition, I am not aware of any large projects which use PHP's gettext. I would appreciate any links to well-used (and therefore tested), systems which actually rely on the native PHP gettext extension.

Best Answer

Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.

You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.

If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.