Difference between revisions of "Localization AOO"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Issues in current l10n process and proposal for new process)
(No difference)

Revision as of 15:49, 16 October 2012

This is an updated version of Localization for developers, currently only available as pdf file:

File:L10proc.pdf

The pdf document the current procedure in high detail and graphic.

I have for discussion purposes extracted the open issues:

Open issues

The current localization workflow as outlined above has several drawbacks and plenty of room for improvement. The drawbacks as well as other ideas to make the l10n process robust and stable have been collected below. These issues should be discussed either through the wiki or through the mailing list. When there is a proposed solution to all issues, that the community in general agree to, this document will be converted into the proposed structure with a list of to-dos. The list of issues is not prioritized.

Workflow is not a designed approach

The current workflow is probably created as needed and as a consequence it has big portions of “left-over” from

  • the original openOffice (not localized)
  • the SUN era
  • the ongoing integration of openOffice in the Apache environment

the l10n process is merely a “must” and not as interesting to work on as other parts

The localization workflow is convoluted and hard to understand

Much tooling is involved outside the build process.

Some of this tooling seems to be lost after a disk crash of the old OpenOffice pootle server

This results in a manual process that is undocumented and known only to a select few.


Proposal

Once we agree on all issues a design paper on a proposed structure will be make available and be basis for discussion.


Tools are written in multiple languages

The tools involved are written in a variety of languages: C++, Java, Perl, and Python. This is not bad in itself. For example it makes sense to parse Java property files with Java code. But there is also C++ code for iterating over the tree of source files that uses hard coded lists of other executables and scripts for processing individual files. That leads to many processes to be created and destroyed, something that is notoriously slow on Windows.

Some of the tools are not used anymore. For example there are no .xtx, .xrb, .xxl, .xgf, or .xcd files. Therefore the xbtxex and xmlex tools can be dropped. (May have already happened for xmlex) Others are used but do not run (like the jpropex tool). And then there is our own preprocessor for handling resource files, which might be replaceable by the standard C/C++ preprocessor (which parses the included hrc files anyway since they are included in C++ code.)

On Linux or MacOS you have to use a full qualified path to the output file. Otherwise you won't get an output file and also no error. The tooling seems to be very error-prone. A lot of space for improvements. At the moment localize runs with errors on Windows: jpropex, a shell script that calls a java program does not run. Linux is OK.

Streamline the number and implementation of the tools used for extraction and merging of localizable strings. Use the right language for each task.


Proposal

Rewrite localize_sl, include the conversion programs (more efficiently). Use gcc preprocessor instead of our own.


Use of .sdf file

AOO uses its own non-standard file format (SDF) for handling localized strings. In order to use a pootle server for the actual translation, all .sdf files have to be transformed into .po files and, after translation, back into .sdf files. It should be also taken into consideration a future migration to xliff format for translation handout.


Proposal

The .sdf files are merely intermediary files between the source files and the po files, and should be eliminated.

The choice of .po or .xliff is not so easy:

1) source <-> .po and .pot files

The advantage of this approach is that all translators knows .po

The very big disadvantage is that the format has no standard way of storing extra information. We need to store the relative path of the originating source file (as in .sdf) in order to be able to split the information.

2) Source ↔ .xliff The advantage of this approach is that we can store extra information as needed, furthermore there are xliff editors out there including pootle server. It would also eliminate the need for template files.

The disadvantage is that it is a new format, and offline translators would need to change editor.

--- Personally I would prefer .xliff since it makes programming a lot easier, but I think we need to listen carefully to the translators.


Separate projects for UI and help

We should create 2 separate projects: one for UI and one for Help. And we should keep it separated between versions because there will be probably some overlap with potential conflicts. Maybe an approach of keeping two versions in pootle to give translators the chance to work on translation after a release. And to allow future development toward the next release in parallel.

For example something like:

Apache OpenOffice 3.4 UI (aoo34)

Apache OpenOffice 3.4 Help (aoo34help)

Apache OpenOffice 4.0 UI (aoo40)

Apache OpenOffice 4.0 UI (aoo40help)

note: there are already 2 projects (a0034 and a0034help)

At the moment there are 276 different files to translate. Having that many files to translate makes it more likely that the same term is translated differently and currently there are no glossary list available.


Proposal

The process makes 2 files (.xliff or .po) for each language:

  • localize_ui.<xx>
  • localize_help.<xx>
  • glossary.<xx> this file is not generated but maintained by the translators

These 3 files are delivered to the pootle server, translated and sent back for storage in SVN. These files are handled as other files in respect to versions and releases.


Build process is highly manual and error prone

Total workflow should be automated.

A developer can insert the text directly in the source file or in a resource file, for the program both ways work, however only a limited number of file extension types are today scanned for texts, so in worst case some texts are never translated.

Integrate the string extraction into the build process. Most of the files that can contain localizable strings are already part of the build system, mostly for the merge process. For example there are make-rules for transforming and merging rsc files into .srs and then into .res files. Add rules for the string extraction. This would allow developers to count new strings and the buildbot could extract the new strings and upload them to the pootle server.


Proposal

Add a new target in the makefiles (l10n_gen). Developers can then assign which files belong to this target.

Localize_sl should be rewritten so it can run in multiple makefiles (no directory scanning). Localize_sl will generate a snippet file that will be stored in a staging area (l10n/stating) and as last step in the “build –all” process, l10n will be “built”, that is the snippets will be used to update the single language files. With this process the language files will always be “ready” for use in the build process.

However the pootle server still need to be manually updated.


Automatic update of pootle server

Translators need versioning possibilities

Offline translation needs to be controlled (delivery etc).

At the moment there are no computerized control over when a translation is ready for merge, nor can a translation be given a status like e.g. “ready for review”.

pootle server can use SVN directly, and thereby offer version control, however at the moment this is not used.


Proposal

Make a new subproject in main called l10n, this project contains the language files (basically extras today), but also .mk file for generation.

The pootle server works direct on SVN. With this philosophy translators are seen as just another breed of developer (bot work with languages) and we have all the advantages of a version system when working on larger translations.


Content control

PO->SDF There are currently no control of the content quality (it is possible to make a translation, where all translated text are “not-translated” and it will pass. PO->SDF There are no check, that changed text are changed in the translation.


Proposal

Write a new tool that controls the tranlated part (based on the idea from poConsistency) and integrate in the “build –all” process.

Personal tools