Use ICU for more I18N functions implementation

From Apache OpenOffice Wiki
Revision as of 07:00, 28 April 2012 by Zhangjf (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ICU has already been widely used for many I18N functions in current code base, such as text break iterator, calendar, text layout and the latest regexp replacement ... This document describes the idea to replace more I18N related function with ICU. Depends on ICU's capability, some replacements have already been implemented in Symphony 3, while most are just technical concept or partially possible. Technically, it should go in the direction of using ICU for all i18n functions if possible.


Text conversion Current AOO has it's own conversion table for several encodings in sal module, but in fact the number is limited. ICU supports a more complete set of encoding conversion for nearly all platforms and languages, it also has better conversion performance. And icuuc lib has already been linked in many AOO modules, so it can also help to reduce a few memory foot print.

Character classification Same as text conversion, ICU character classification function can also be leveraged for AOO, so it no longer need maintain it's own table. Because of different character type definitions, it may needs map between AOO and ICU.

Formatting Message Use ICU MessageFormat to use message pattern to support variable element in translation string, so to support smart translation order for different languages. It is a common scenario in text translation and not well supported now.

String classes and text manipulation Technically ICU string classes can be used to replace nearly all AOO text/string functions, so to avoid the maintenance work and catch up Unicode version upgrade easily. But it is not an easy task and not necessary in short term.

Locale data repository AOO also maintains it's own locale repository in i18npool module. I don't know the way how it is generated, but looks a little bit out of synchronize with Unicode CLDR repository data. Ideally it is great if it is completely taken over by 3rd party modules. But because it supports more date/time/number patterns than what are defined in CLDR, we can consider to partially use ICU for locale data in short term.

Number/Date/Time formatter It is one of the basic I18N functions in ICU. But unfortunately AOO is using a different pattern symbol set from ICU, a few of patterns have no corresponding map at ICU side. For back compatibility purpose, we can not change to use ICU pattern symbol, we need map between these 2 pattern symbol sets. Before their function set matches, same as above we can also consider to partially use ICU in short term.

Personal tools