User:JanIversen/jan test
Introduction
This document is based on and extents Localization_for_developers. The document is work in progress showing the result of a detailed technical analysis of the current process (version 3.4.1) . As such this document should be seen as a replacement of Localization_for_developers.
The l10n process only concerns itself about localizing defined supported languages. Adding a new language is a i18 process. This document is further restricted to the ongoing translation process and closely related build process. In case of external happenings, like e.g. Germany changing rules of spelling, it should be covered with i18 procedures.
The document will hopefully spark a discussion so it can be updated with other views from the ooo-L10n@incubator.apache.org.
It is important to understand the current process before we start discussing detailed changes, so this is the main purpose of this page. Once all the open issues at the end of document have been discussed as solutions agreed upon, a new document will be made describing the process as it should be in the near future.
Thanks to all those persons who contributed to Localization_for_developers that has been a great starting point for this document.
Overview
Localization, often abbreviated as l10n, defines the process to make a software package available in local languages, different to the language of the developer.
Localization is from the perspective of the involved person a multi-step process that involves a variety of tools and procedures. Most importantly the 4 main categories of involved persons have quite different and to some extent conflicting views and requirement, therefore the process should be a real “best of all worlds” approach.
The current process is more or less purely developer oriented, contains a lot of different tools and depends a lot on the responsibility of the involved people. It seems to be a process that has grown out of necessity more than a planned road.
Most of the tools used as well as the central data format (SDF) are specific to AOO and not used anywhere else even though both source (c++, resource, UI files) and target (po files) are standard file formats.
Only a part of the workflow are integrated in the build system. Much of it requires manual steps to be taken. Some of the tools involved are not part of the OpenOffice SVN and, due to a hard disk crash of the old pootle server, are lost.
Translations are done with the help of a pootle server. The localization work flow can very short be seen as:
- extraction messages from source files.
- uploading message to the pootle server.
- translating messages on the pootle server.
- downloading messages from the pootle server.
- merging messages into source files.
If you are looking for information about how to contribute translations then Localization gives an overview.
The document has 5 parts:
- a relative non-technical overview of the process,
- a detailed technical overview of the process,
- a detailed technical data flow/storage view,
- a detailed technical view of the tools used with parameters etc,
- an open issues list,
Actors and Systems
The l10n process can and should be viewed with respect to 4 different categories of people who access the process through 2 different systems. The translator consider pootle server to be repository whereas the others consider SVN the main repository.
Note: this view only relates to the l10n procedure, the picture for the whole project is a lot more complex.
The red lighting indicates that the pootle server only works indirectly on the SVN server.
The red lightning indicates that data is being copied:
- to/from pootle server, which requires manual intervention during the build process
- to tester which is quite normal, since a tester normally get an install-set.
Developers
Developers construct the actual program, using dedicated development tools.
Developers will as part of the development process embed messages (errors, warnings …) in the source code and/or build UI. The embedded texts are defined to be in English but the source code are in different programming languages, making extraction a challenge.
Developers are fluent in their language (C++, java, python etc.) but for sure not in all the native languages supported by AOO therefore localization is needed.
Developers uses solely SVN as their repository.
Translators
Translators add texts in the local native language, relating (translating) to the original message. In a release there is a 1-n relation between the original message and the supported languages, where n is the number of supported languages.
Translators does in principle not need to have programming knowledge because in essence they are presented with a list of texts extracted from the source and delivers the translated text back.
Translators work solely with the pootle server which today has no direct connection to SVN but work in parallel with SVN and are updated manually with regular intervals.
Integrators
Integrators initiate and control the build process.
Integrators does in principle not need to have programming or translation knowledge, because they are basically doing administrative tasks.
Testers
Testers check the total system and do a quality assurance of the behavior.
Testers need a deep knowledge of the behavior of the system, but deep technical knowledge is not needed.
Today testing seems to be very limited and not formalized in respect of the l10n process.
System: SVN
The sub version server is the actual repository and ideally all systems should work directly on this server.
All source files, documents etc. are stored in SVN.
System: pootle server
The pootle server provides an environment for translators to work in.
Today the pootle server contains all the translations and are updated from SVN and are as a consequence not synchronized and without version control (during the translation process).
Furthermore many translators work offline without any control.
L10n workflow high altitude view
The workflow seen from the outside is quite simple, but still some of the shortcommings should be very obvious.
The workflow is designed as a waterfall, but one of the good norwegian ones where water is pumped back up at night time. Idealy for each release each section is done only once (waterfall), but in real life two things happen (norwegian night pumping):
- Some sections happens in parallel (e.g. Translators start working with early code)
- Some sections are repeated due to problems found in later sections
This is quite normal and normally not a real problem provided the process is automated and has a number of quality gates.
However the current process there is only a single automated quality gate which are pure technical (solving: “Can the product be built without errors?”) the rest is left to us humans.
The workflow only concentrates on the l10n process which is only a subset of the total lifecycle process.
The model shows at least one problem, the parallelism of “Translation online” and “Translation offline”. To put it a bit on edge, this works because there are no alternatives and because there are few volunteers.
Content creation
Developers construct/develop new functionality or correct bugs/issues using different tools and programming languages. During the programming they may insert texts in the source files, this is done very differently depending on programming language and type of application (UI or error/information messages).
All text are written in English according to the programming guidelines, however there are no review process to secure the quality of the text or consistency with the rest of the product.
Note: A developer can insert the text directly in the source file or in a resource file, for the program both ways work, however only a limited number of file extension types are today scanned for texts, so in worst case some texts are never translated.
Upload pootle server
The source files are stored in SVN. In general the content of SVN is floating since it contains the absolute last updates, with the consequence that a total build very often will fail. To circumvent this problem a snapshot is made from time to time, guaranteeing a successful build but the package might not function correctly.
The snapshots can be used for a manually started extraction to the pootle server.
The extraction program loop over all files in SVN
- building one big sdf file.
- the sdf file are then split into multiple template files.
- the template files are merged with the existing po files in the pootle server.
- pootle server database contain one set of po files for each language.
The purpose is to decouple the development process from the translation process. The purpose is achieved, but the route is highly manual and error prone.
If life was ideal, translation would only take place when development is completed, but typically translation takes place at several stages of the development process for several reasons:
- A release consist of changes to multiple function group (e.g. draw, write and calc), and these developments are finished at different point in times. Whenever a development of a group is finished this group can be translated and thus the decoupling will be repeated.
- Translation often takes place while testing is ongoing, any bug fixing must lead to a new decoupling, and since there are no version control of the translated parts it can only be controlled manually if there are changes.
- There are currently no short-cuts to fast translate a bug fix that involves a known text change
Note: This part of the process is highly manual and very error prone, since it involves coordinating the effort of a high number of people
Translation
Translation takes place on an offline copy consisting of multiple po files. These po files are generated each time, so any additional information the translators would like to keep (e.g. comments) are lost.
At the moment there are 276 different files to translate for each language. In order to split the work UI and Help are separated, there are
- 20 help files (but they are big!)
- 256 UI/message files (typically an average of 20lines)
Having that many files to translate makes it more likely to get content inconsistency (same term is translated differently).
Since the files are solely generated from the sources, there are no glossary file available, making it very difficult for new volunteers to help. Furthermore there are no control of how accelerators are used.
The online and offline translation process are handled quite differently.
Note: Today there are no version control and as such no computer controlled review and as a consequence the content quality varies.
Translation online (“committer”)
The po files are stored in pootle server database and thereby available to translators with through the HTML interface.
Due to the lack of version control, team work must be controlled carefully.
Once a translation is complete, the translator(s) must manually inform the integrator that the set is ready for merge.
Translation offline (non “committer”)
The integrator will manually extract the po files from the pootle server and send the files to the translators without “committer” status. The copy is not under version control or otherwise controlled.
Once the translation is complete the the translator must send the files back to the integrator.
There are no computer control with which translations are outstanding, which are in manual review and which are completed, this is currently controlled by the integrator.
Note: Neither bugzilla nor the mailing list allows these big attachments, so it must be sent to a private mail address or posted on a private web page.
Merge SVN
The integrator must manually decide that all offline translations are back and all online translators have finished (translation review is left to the single translator team). At a point in time decided by the integrator to start the merge, which consist of several manual steps: • synchronize po files with content of the pootle server database • add the offline translated files • convert po files to sdf file (one pr language) • store sdf file in SVN. This part of the process does not allow for glossary files, because the converters would have no source parts to relate the glossary to. Update pootle server Now it is time to synchronize the pootle server, to make sure then content is identical with SVN. Based on the new sdf file (one pr language) the following actions are taken: • Convert sdf til template file • update templates in pootle server Language build Finally a test release can be built, and the testers can control the final result. It should be noted that there currently no formal testing of the native language versions. Page 10 Simplified data flow The current data flow is pretty complex, and it seems more like a “invented as needed” structure. The first part shows the text flow from developer to translator: