Revision as of 01:39, 1 May 2008

Breaking encapsulation of ICU BreakIterator

Because of Issue 84467 (duplicate of the Issue 81519 ) we are using RuleBasedBreakIterator() constructor and then we want to setBreakType() there.

ICU code:

BreakIterator reference
RuleBasedBreakIterator reference

OpenOffice.org code:

BreakIterator_Unicode::loadICUBreakIterator function

Mailing list discussions:

Discussion about ICULanguageBreakFactory
ports/121787 FreeBSD problem report
Debian bug 448745
icu-support

Example reasons to use custom rules:

Use cases of `loadICUBreakIterator`

Questions:

Why does wordRule need to be static and preserved across the calls?
Is rulestring word used at all? Other WordTypes?

public method	loadICU call	resulting rule text
nextCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone) prevCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone)	loadICUBreakIterator(rLocale, LOAD_CHARACTER_BREAKITERATOR, 0, "char", Text)	`char`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, ANYWORD_IGNOREWHITESPACES, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, ANYWORD_IGNOREWHITESPACES, NULL, Text)	`edit_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, DICTIONARY_WORD, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, DICTIONARY_WORD, NULL, Text)	`dict_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, WORD_COUNT, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, WORD_COUNT, NULL, Text)	`count_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, another_word_type, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, another_word_type NULL, Text)	`word` (???)
beginOfSentence( const OUString& Text, sal_Int32 nStartPos, rLocale) endOfSentence( const OUString& Text, sal_Int32 nStartPos,rLocale)	loadICUBreakIterator(rLocale, LOAD_SENTENCE_BREAKITERATOR, 0, NULL, Text);	NULL
getLineBreak( const OUString& Text, sal_Int32 nStartPos, const lang::Locale& rLocale, sal_Int32 nMinBreakPos, const LineBreakHyphenationOptions& hOptions, const LineBreakUserOptions& /rOptions/ )	loadICUBreakIterator(rLocale, LOAD_LINE_BREAKITERATOR, 0, "line", Text);	`line`

Figure out if locale BreakIteratorRules ({edit_word, dict_word, count_word, char, line}) gives something for the requested locale
If not, try to load rule+_ + lang string anyway.

@@ Line 1: / Line 1: @@
 =Breaking encapsulation of ICU BreakIterator=
-Because of {{Bug|84467}}, we are using <code>RuleBasedBreakIterator() constructor</code> and then we want to <code>setBreakType()</code> there.
+Because of {{Bug|84467}} (duplicate of the {{Bug|81519}}) we are using <code>RuleBasedBreakIterator() constructor</code> and then we want to <code>setBreakType()</code> there.
 ICU code:
@@ Line 9: / Line 9: @@
 OpenOffice.org code:
 * [http://l10n.openoffice.org/source/browse/l10n/i18npool/source/breakiterator/breakiterator_unicode.cxx?rev=1.34&view=markup BreakIterator_Unicode::loadICUBreakIterator] function
+Mailing list discussions:
+* [http://www.nabble.com/Minor-changes-needed-to-ICULanguageBreakFactory-(ICU4C)-td10069414.html Discussion] about [http://bugs.icu-project.org/trac/ticket/5695 ICULanguageBreakFactory]
+* [http://www.freebsd.org/cgi/query-pr.cgi?pr=121787 ports/121787] FreeBSD problem report
+* [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=448745 Debian bug 448745]
+* [http://sourceforge.net/mailarchive/forum.php?thread_name=200804301825.10247.mi%2Bicu%40aldan.algebra.com&forum_name=icu-support icu-support]
+Example reasons to use custom rules:
+* {{Bug|72868|Writer/Impress: line does not break after Chinese punctuation and before Latin letters}}
+* {{Bug|80891|character in the forbidden list sometimes appears at the home of line}}
+* {{Bug|83229|wrong hyphenation when word does contain a hyphen}}
+* {{Bug|83649|Line break should be between typographical quote and left bracket}}
+* {{Bug|83464|line brake between letter and $}}
+* {{Bug|81448|slash and backslash make non-braking spaces of preceding spaces}}
 =Use cases of <code>loadICUBreakIterator</code>=

Difference between revisions of "LoadICUBreakIterator"

Revision as of 01:39, 1 May 2008

Breaking encapsulation of ICU BreakIterator

Use cases of `loadICUBreakIterator`

Views

Personal tools

Navigation

Search

Tools

Difference between revisions of "LoadICUBreakIterator"

Revision as of 01:39, 1 May 2008

Breaking encapsulation of ICU BreakIterator

Use cases of loadICUBreakIterator

Views

Personal tools

Navigation

Search

Tools

Use cases of `loadICUBreakIterator`