Difference between revisions of "Localization for developers"

Revision as of 09:55, 6 February 2012

This page (tries to) explain the technical aspects of the Apache OpenOffice localization workflow.

Overview

Localization, often abreviated as l10n (there are ten letters between the leading L and the closing N), is a process that has several steps:

Content Creation: Write code or help files or any other content that needs localization.
Extraction: Once in a while (for every milestone) the localize_sl script is used to extract the strings that need to be localized.
Upload: The sdf file created by localize_sl is uploaded to the pootle server and transformed/converted into po files.
Translation: Translation takes place, either directly via the pootle server's html frontend or via an offline editor.
Download: The po files are eventually downloaded from the pootle server, converted into sdf files.
Integration: When the office is built with configure switch --with-lang="..." then the english strings are replaced by translated strings from the localize.sdf files. The result is a localized install-set ready to use or a language pack that can be applied to an already installed office.

Details

Content Creation

Write text that needs to be localized. This can be help files, configuration files (.xcu), or resource files (.rsc). Source code does not contain localizable strings directly but uses resource files for that.

Extraction

Once in a while (for every milestone) run solenv/bin/localize_sl (which forwards the call to solver/340/<platform>/bin/localize_sl<.exe> which forwards it to solver/340/<platform>/bin/localize<.exe>)

localize iterates over all files in the source tree and searches for files that may contain strings that need localization. The found files are processed with one of several extractors (implemented in a variety of languages: C++, Python, Java). The result is one single sdf file.

A typical call to localize looks like this:

   localize -e -l en-US -f foo.sdf

The resulting foo.sdf.main (where does the .main suffix come from?) has at the moment (SVN revision 1237934) 72556 lines and 13,063,597 bytes. 45302 lines (9,026,966 bytes) of these belong to the helpcontent2 module.

At the moment localize runs with errors: jpropex, a shell script that calls a java program does not run.

Upload

The sdf file created by localize is uploaded to the pootle server and transformed/converted into po files (not necessarily in this order). Probably integrated into existing po files.

The helpcontent2 module is handled separately from the other modules but to avoid to dishearten translators that work on the UI part (everything not helpcontent2) and do not see progress (due to the larger number of strings in helpcontent2.)

Translation

Translation takes place, either directly via the pootle server's html frontend or via an offline editor.

Download

The po files are eventually downloaded from the pootle server, converted into sdf (or converted and then downloaded) and integrated into the localize.sdf files in extras/l10n/source/<language>/

Integration

When the office is built with configure switch --with-lang="..." then extras/l10n is built and the localize.sdf files are rearranged. In l10n they are grouped according to language. Now they are grouped according to module (and directory.) The sdf files in extras/l10/<platform>/misc/sdf are zipped into one archive per module and delivered into main/solver/340/<platform>/sdf/<module>.zip and then forgotten (at least for the processing of src files.)

Resource files (src files) are processed when the other modules are built. The original src files contain strings only for en_US in lines that look like

   Text [en_US] = "...";

transex3 adds the missing languages by adding lines like

   Text [de] = "...";

By default all (available) languages are added not just the ones given to configure's --with-lang switch. The augmented src files are placed in <module>/<platform>/misc/... These are then aggregated into some srs files in <module>/<platform>/srs/. In a (or several) following step(s) the srs files are aggregated into res files, one for each language.

The resulting res files are delivered to main/solver and become part of the installation sets. Multi language versions contain res files for more than one language.

At runtime the ResMgr class from the tools module is responsible to use the resource files of the currently selected language whenever a string is requested (as is the case for eg all button texts and in general for all text visible in the GUI.)

Critque

The current localization workflow as outlined above has several drawbacks.

The workflow looks more like an ad-hoc solution than a designed approach.
The tools involved are written in a variety of languages: C++, Java, Perl, and Python. This is not bad in itself. For example it makes sense to parse Java property files with Java code. But there is also C++ code for iterating over the tree of source files that uses hard coded lists of other executables and scripts for processing individual files. That leads to many processes to be created and destroyed, something that is notoriously slow on Windows.
Some of the tools are not used anymore. For example I did not find any .xtx, .xrb, .xxl, .xgf, or .xcd files. Therefore the xbtxex and xmlex tools can be dropped. (May have already happened for xmlex) Others are used but do not run (like the jpropex tool). And then there is our own preprocessor for handling resource files, which might be replaceable by the standard C/C++ preprocessor (which parses the included hrc files anyway since they are included in C++ code.)
OpenOffice uses its own non-standard file format (SDF) for handling localized strings. In order to use a pootle server for the actual translation, all .sdf files have to be transformed into .po files and, after translation, back into .sdf files.
The localization workflow is convoluted and hard to understand. Much tooling is involved outside the build process. This results in a manual process that is known only to a select few.

@@ Line 81: / Line 81: @@
 files of the currently selected language whenever a string is requested (as is the case for eg all
 button texts and in general for all text visible in the GUI.)
+==Critque==
+The current localization workflow as outlined above has several drawbacks.
+*The workflow looks more like an ad-hoc solution than a designed approach.
+*The tools involved are written in a variety of languages: C++, Java, Perl, and Python.  This is not bad in itself. For example it makes sense to parse Java property files with Java code.  But there is also C++ code for iterating over the tree of source files that uses hard coded lists of other executables and scripts for processing individual files.  That leads to many processes to be created and destroyed, something that is notoriously slow on Windows.
+*Some of the tools are not used anymore.  For example I did not find any .xtx, .xrb, .xxl, .xgf, or .xcd files. Therefore the xbtxex and xmlex tools can be dropped. (May have already happened for xmlex)  Others are used but do not run (like the jpropex tool). And then there is our own preprocessor for handling resource files, which might be replaceable by the standard C/C++ preprocessor (which parses the included hrc files anyway since they are included in C++ code.)
+*OpenOffice uses its own non-standard file format (SDF) for handling localized strings.  In order to use a pootle server for the actual translation, all .sdf files have to be transformed into .po files and, after translation, back into .sdf files.
+*The localization workflow is convoluted and hard to understand.  Much tooling is involved outside the build process.  This results in a manual process that is known only to a select few.

Difference between revisions of "Localization for developers"

Revision as of 09:55, 6 February 2012

Contents

Overview

Details

Content Creation

Extraction

Upload

Translation

Download

Integration

Critque

Views

Personal tools

Navigation

Search

Tools