Using TeX hyphenation patterns in

From Apache OpenOffice Wiki
Jump to: navigation, search

Summary: this document describes how to properly use TeX hyphenation patterns in and other software using the Hyphen hyphenation library.

Written by Martin Srebotnjak; some portions of text contributed by László Németh and Mojca Miklavec.

Introduction uses Hyphen, part of the Hunspell project, as its hyphenation engine.

The hyphenation files are represented by two files:

  • the patterns file (a text file with all the patterns and extra hyphenation rules; hyph_xx_YY.dic) and
  • the readme file (a text file with all the credits and licensing information; README_hyph_xx_YY.txt).

The language descriptor xx_YY is an actual ISO-code, you can look it up in the following table:

From 3.0 onwards the hyphenation patterns are packed as an extension, usually as a part of a dictionary language pack (with a spell-checking dictionary for the same language and, optionally, a thesaurus). Here is a list of available dictionary language packs:

Using TeX patterns

Hyphen (and can use TeX hyphenation patters for hyphenation, which is great, because TeX patterns are available for more than 50 different languages.

But because of differences between TeX hyphenation and Hyphen the TeX hyphenation patterns must be first converted. If conversion is not applied, several issues can surface:

  • not all TeX patterns will work in - which means that TeX patterns will perform substandardly in;
  • if code-page is not set correctly the TeX patterns can behave erratically in;

Conversion of TeX patterns

The following conversion process must be followed step-by-step:

1. Download up-to-date TeX hyphenation patterns

Tex hyphenation repository contains up-to-date TeX hyphenation patterns. They are located here:

Example: for Slovenian language one would download file hyph-sl.pat.txt from the SVN repository.

2. Convert TeX hyphenation patterns file into proper character set

Hyphen for (prior to version 3.4) uses ISO-8859-X code-pages while TeX hyphenation patterns are in UTF-8. So conversion of downloaded patterns into proper ISO-8859-X code-page is necessary.

Example: Slovenian language uses ISO-8859-2 code-page, so one would open the UTF-8 file in a code-page savvy text editor, then convert and save it as an ISO-8859-2 coded text.

3. Run the conversion script

Hyphen library (based on libhnj from Raph Levien) uses a time optimized implementation of the original Liang's algorithm of TeX, and conversion is a requirement of this implementation. You can download the latest version of conversion script from the Hyphen repository. At the time of writing this was:


The script has the following parameters: the input file name, the output file name, the code-page setting and the LEFTHYPHENMIN and RIGHTHYPHEMIN values that define the minimum left and right length of hyphenated words.

Example: for Slovenian the ISO-8859-2 code page is used, left and right hyphenmin values are 2. So one would use:

./ hyph-sl.pat.txt hyph_sl_SI.dic ISO8859-2 2 2

Warning: note how ISO8859-2 is used and not ISO-8859-2! Remember to omit the first hyphen in the ISO codepage name!

4. Add hyphenation rules for special characters

Special characters (apostrophe, hyphen, n-dash, m-dash ...) are word characters in, but not boundary characters in the hyphenation of which is an incompatibility with the TeX boundary hyphenation patterns. It results in potentially bad hyphenation for words with hyphens and other special characters. Please consider adding the following lines at the end of the converted hyphenation patterns file (hyph_xx_YY.dic):


Note: "..." represents all missing lines for other characters of your alphabet.

5. Create/update the appropriate readme file

The readme file contains the credits (author of the patterns, other collaborators...) and licensing information.

Official TeX hyphenation patterns are released under the GNU Lesser General Public License (LGPL) which makes them appropriate for inclusion into

If you use TeX hyphenation patterns from other sources remember to check and mention the license they are available under in the readme file.

The hyphenation patterns to be included in the official builds must also include the filled-in form data.

6. Create/update an dictionary extension

Package the converted hyphenation patterns in a new (or update the existing) extension and upload it to the extension repository.

Before uploading try extensively if it works properly with different versions of

If the patterns perform well and the patterns are licensed under LGPL, the patterns could eventually make it into the official releases of

The future

With 3.4 support for UTF-8 patters is planned, which would make the Step 2 (from above) obsolete and change the conversion line from Step 3 into:

./ hyph-sl.pat.txt hyph_sl_SI.dic UTF-8 2 2

But patterns in UTF-8 will not work with older versions (prior to 3.4). However, if you do decide to make an extension version with hyphenation patterns for in UTF-8, do not forget to set the required version of in the extension description.xml to 3.4 or higher, like this:

        < value="3.4" d:name=" 3.4" />

This will allow older versions of to use the older, non-UTF-8 version of your extension.


Since Hyphen engine is used also by other open-source software projects (as are the hyphenation patterns files), following these instructions will provide correct patterns for programs like Scribus, KOffice etc.


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools