wiki:i18n

Translation / localization issues

Some ideas on how translation of the software can be improved in the future. I (Jeroen) started some discussion on this topic on the mailinglist (see: http://www.nabble.com/Localization-issues-tf4097440s18419.html ).


Update 25-02-2008 Discussing with Stefano Costa I was pointed to pootle and ran into the XLIFF format which seems a good direction for us.

It will require rewriting the below to fit that obviously more standard approach.

I can see some relations to translated metadata content too :-)


Our current structure of language files is the following one:

English:

<strings>
        <about>About</about>
        <abstract>Abstract</abstract>
        <accept>Accept</accept>
        <add>add</add>
        <addNewMetadata>Add new metadata</addNewMetadata>
</strings>

Arabic:

<strings>
        <about>عن البرنامج</about>
        <abstract>مقدمة</abstract>
        <accept>قبول</accept>
        <add>إضافة</add>
        <addNewMetadata>بيانات أساسية Metadata جديدة</addNewMetadata>
</strings>

A possible solution to improve this structure is:

English:

<strings>
        <string i18n_key="about">About</string>
        <string i18n_key="abstract">Abstract</string>
        <string i18n_key="accept">Accept</string>
        <string i18n_key="add">add</string>
        <string i18n_key="addNewMetadata">Add new metadata</string>
</strings>

Arabic:

<strings>
        <string i18n_key="about">عن البرنامج</string>
        <string i18n_key="abstract">مقدمة</string>
        <string i18n_key="accept">قبول</string>
        <string i18n_key="add">إضافة</string>
        <string i18n_key="addNewMetadata">بيانات أساسية Metadata جديدة</string>
</strings>

This can have the following advantages:

  • This has minimal impact on our existing stylesheets that generate the localized output.
  • It is pretty trivial to generate a list of i18n_keys in use in the software.
  • The generated output can be used to highlight missing and obsolete keys in each language file.
  • There is the potential to store all keys and language strings in a database that has a simple translation user interface accessible through the web. Such translation mechanism has already been implemented in GeoNetwork opensource for the non-static strings in the system. It could be setup on the developer website to work as the central translation facility. The static files required for a release can be generated as language packages automatically and be stored in SVN. It would require an import function to merge new strings in and flag missing ones. It would also need an XSL based export function to generate the appropriate output files.

Update 26-02-2008 by Heikki

The original statement of the problem, in the mailing list referred to above, is this:

The problem: With the expanding number of languages it becomes more problematic to keep all language files synchronized. It also is not trivial what strings have become obsolete, causing people to translate text that actually is not used anymore.

So the problem is really about synchronization - not about the format of the i18n files.

What is needed is a check that each localization string exists in each supported language file, no more and no less.

To me it seems there are plenty editors available that help do this; for example copy your files into a spreadsheet editor, sort the keys alphabetically, and there you go. If you want some procedure to do a check automatically e.g. before making a release, it still seems simple enough to write a small checker ourselves? OK, I'm disregarding fancy ideas like "deprecating" translations.

What in my view we don't need is a tool that allows putting style information, like <b>, in a translation system. But this is exactly what IBM presents as an advantage of XLIFF: Fortunately, the XLIFF standard includes attributes for specifying string position in a dialogue, font type and size used for the text, and many other details. A tool specifically designed for software localisation can be used to visually adjust the dialogue layout during translation. (from * XML in localisation: A practical analysis).

Even though this addresses a real concern I would still advocate an approach where the user-interface (as in HTML and CSS) is as decoupled as possible from anything else, in this case the translation system. To begin with I don't think the GeoNetwork practice of generating HTML from XSLT is a very good thing, and things will only become more pasta if we put style information in a translation system that your front-end developer shouldn't know.

In my experience the simple approach of Struts has always worked fine, even if it's lacking a synchronization check: all i18n strings are resolved through the default translation file, unless your Java Locale is set to something else, in which case the localized translation file (recognized by appending the ISO language code to the default file name) is used first to look up the translation. No <use> <of> <xml> <where> <this> <isn't> <necessary>, just plain old property file syntax.

To address the original problem (synchronization / consistency checking) I think the easiest solution is to settle for one format, possibly the existing one, and either find or else write a little checker that could be made part of the build process to ensure nothing is lacking or superfluous in the supported languages.

Last modified 14 years ago Last modified on Feb 25, 2008, 6:53:09 PM