Extension Dictionaries

From Apache OpenOffice Wiki
Jump to: navigation, search

Already provided configuration entries

Each implementation of spell checker, hyphenator or thesaurus needs to have an entry in the configuration (i.e. the file Linguistic.xcu) stating its implementation name (for purposes of identification) and the file format names of the dictionaries it can handle. More than one file format can be listed if supported.

The dictionary file format names for the current Apache OpenOffice linguistic are

  • DICT_SPELL
  • DICT_HYPH and
  • DICT_THES


Thus the respective entries do look like this:

  • Spell checker entry
<node oor:name="ServiceManager">
    <node oor:name="SpellCheckers">
        <node oor:name="org.openoffice.lingu.MySpellSpellChecker" oor:op="fuse">
            <prop oor:name="SupportedDictionaryFormats" oor:type="oor:string-list">
                <value>DICT_SPELL</value>
            </prop>
        </node>
 
        ... entries for other spell checkers ...
    </node>
</node>
  • Hyphenator entry
<node oor:name="ServiceManager">
    <node oor:name="Hyphenators">
        <node oor:name="org.openoffice.lingu.LibHnjHyphenator" oor:op="fuse">
            <prop oor:name="SupportedDictionaryFormats" oor:type="oor:string-list">
                <value>DICT_HYPH</value>
            </prop>
        </node>
 
        ... entries for other hyphenators ...
    </node>
</node>
  • Thesaurus entry
<node oor:name="ServiceManager">
        <node oor:name="org.openoffice.lingu.new.Thesaurus" oor:op="fuse">
            <prop oor:name="SupportedDictionaryFormats" oor:type="oor:string-list">
                <value>DICT_THES</value>
            </prop>
        </node>
 
        ... entries for other thesauri ...
</node>


The only link between one of the above services and the dictionaries to be used by them is the name of the dictionary format. When invoked the services are required to check in the configuration which dictionaries they can make use of and thus establishing their set of dictionaries to use. This is done by looking at the format names of the configured dictionaries.

Dictionary entries (must be provided)

The entries that are still missing and need to be provided are those for the dictionaries. Each dictionary must have an entry of its own.

An entry consists of

  • a unique name (the node name in the configuration)
  • a list of file locations (only the ones actually needed by the service implementation)
  • a single format name and
  • a list of ISO-names for locales listing the languages the dictionary may be used for

Please note that there is no specified order to the list of files provided in the Locations property. Also it is the implementations task to distinguish between those files and their potentially different use by their name only.


Thus a set of dictionary entries in the Linguistic.xcu provided by a single extension may look like this:

 <node oor:name="ServiceManager">
    <node oor:name="Dictionaries">
        <node oor:name="HunSpellDic_de_CH" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/de_CH.aff %origin%/de_CH.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_SPELL</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>de-CH</value>
            </prop>
        </node>
        <node oor:name="HunSpellDic_en_US" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/en_US.aff %origin%/en_US.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_SPELL</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>
        <node oor:name="HyphDic_en_US" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_en_US.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_HYPH</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>
        <node oor:name="HyphDic_de_CH" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_de_CH.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_HYPH</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>de-CH</value>
            </prop>
        </node>
        <node oor:name="ThesDic_de_CH" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/th_de_CH_v2.dat %origin%/th_de_CH_v2.idx</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_THES</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>de-CH</value>
            </prop>
        </node>
        <node oor:name="ThesDic_en_US" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/th_en_US_v2.dat %origin%/th_en_US_v2.idx</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_THES</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>
    </node>
 </node>


About node names for the dictionaries:

If people outside the core developer cycle want to provide an
extensions, I recommend to use the reversed domain schema notation. If
e.g. you are employed by company "linguprovider" in Russia that has the
domain linguprovider.ru you could name your node

"ru.linguprovider.grabinski.dict_ru"

This should help to avoid name clashes. If your company was too big to
make you feel comfortable with being the only "Grabinski", you could add
more "namespaces", e.g.

"ru.linguprovider.division1.grabinski.dict_ru"

Just break it down to a level where you think you can arrange everything
easily.

If you are doing your dictionary just as a private person you can use
your own domain or your e-mail address etc. This should help to keep the
probability of name clashes low.


A most simple dictionary extension providing several dictionaries at once is attached to issue 81365.

A more complete extension probably likes to provide the description.xml for the extension as well in order to give a short description and a version number for the extension.

However for a sample description.xml to use look below.

A sample description.xml for a dictionary extension

<?xml version="1.0" encoding="UTF-8"?>
<description xmlns="http://openoffice.org/extensions/description/2006" xmlns:d="http://openoffice.org/extensions/description/2006"  xmlns:xlink="http://www.w3.org/1999/xlink">
 
 
    <!-- SHOULD OR MUST BE PROVIDED ENTRIES FOLLOWING... -->
 
 
    <!--Here you can state the license text to be displayed during installation.
        You can provide more than one localized version if you like.
        If no matching locale was found the first one will be displayed.
        !!! Don't change the values for 'accept-by' or 'suppress-on-update' it  !!!
        !!! might be troublesome in multi-user installations if no shared-layer !!!
        !!! installation can be done.                                           !!!
 
        !!! IMPORTANT: if the dictionary is to be part of the OOo installation it !!!
        !!! MUST NOT have a registration entry with a license to be displayed.    !!!
        !!! Otherwise the installation will break!                                !!!
        !!! Thus this entry should be used for down loadable dictionaries only.   !!!
    <registration>
        <simple-license accept-by="admin" suppress-on-update="false" >
            <license-text xlink:href="LISEZMOI.txt" lang="fr-FR" />
        </simple-license>
    </registration>
    -->
 
    <!--The version of your extension. NOT the one of OpenOffice.org...
        It will also be used to automatically check if there are updates for this
        extension available. Newer versions should have higher values.
        Only digits and '.' may be used.
    -->
    <version value="1.2.1" />
 
    <!--A unique identifier for your extension.
        In order to avoid name clashes with other extensions it should probably hold
        your company name and maybe your full name along with the name of the extension in 
        a form named reversed-domain-notation which would look like this
            org.openoffice. ...
            net.MyWebpage.www.DictionaryName
        For the very same reason they should NOT start with 'org.openoffice'. That string
        should only be used for extensions shipped with OOo.
        When choosing the identifier keep in mind that others may provide a dictionary for that
        very same language as well and even then your identifier still needs to be unique!
    -->
    <identifier value="net.MyWebpage.www.MyName.OOo-Dictionaries.fr-FR" />
 
    <!--A name for the extension to be used in the UI.
        For dictionaries it should show the locales supported
        and the purpose spell checking and/or hyphenation and/or thesaurus.
        The display name can be localized and there should be at least one
        entry for each language it implements and one default English entry.
        The default entry is the one listed first.
    -->
    <display-name>
        <name lang="en">French (France) spell check dictionary</name>
        <name lang="fr">... to be done ...</name>
    </display-name>
 
    <!--Dictionaries should work with all platforms...-->
    <platform value="all" />
 
    <!--A minimal OpenOffice.org version the extension requires to be used with.
        For dictionary extensions that will be 'OpenOffice.org 3.0'
    -->
    <dependencies>
        <OpenOffice.org-minimal-version value="3.0" d:name="OpenOffice.org 3.0" />
    </dependencies>
 
 
    <!-- MORE OPTIONAL LIKE ENTRIES FOLLOWING (may easily be omitted, out-commented by default)... -->
 
 
    <!--If you uploadet your extension to the repository (which should be the default!) 
        you do not need to have this one.
    <update-information>
        <src xlink:href="http://extensions.openoffice.org/testarea/desktop/license/update/lic3.update.xml" />
    </update-information>
    -->
 
    <!--Check if this is already generated by repository.
        Otherwise you may like to provide it manually.
    <publisher>
        <name xlink:href="http://extensions.openoffice.org/testarea/desktop/publisher/publisher_en.html" lang="en">My dictionary extension (en)</name>
        <name xlink:href="http://extensions.openoffice.org/testarea/desktop/publisher/publisher_fr.html" lang="fr">My dictionary extension (fr)</name>
    </publisher>
    -->
 
    <!--This link will be generated by repository. Check if this already works for multiple languages. 
        If not you may provide it manually if you like.         
    <release-notes>
        <src xlink:href="http://extensions.openoffice.org/testarea/desktop/publisher/release-notes_en.txt" lang="en" />
        <src xlink:href="http://extensions.openoffice.org/testarea/desktop/publisher/release-notes_fr.txt" lang="fr" />
    </release-notes>
    -->
 
</description>

Hints for developers

If you want to test a new version of a dictionary extension before you release it, you don't need to go through the deinstall/install cycle each time. Extensions will get unpacked when they are installed, and you can find your dictionary somewhere inside the $(share)/uno_packages/cache or $(user)/uno_packages/cache folders of your Apache OpenOffice installation (depending on whether the dictionary was installed for all users or the current user only). For testing your new dictionary version you can just copy it over the old one and restart Apache OpenOffice (in case the dictionary was already loaded) and test the dictionary until you are pleased.

What you must not do!

You must have at most one dictionary of any type for each language in your extension!

That is only one for each format DICT_SPELL, DICT_HYPH, DICT_THES per locale.

Otherwise, if for example you have two different spelling dictionaries with different content, they will all be used at the same time(!), which is most likely not want you want. And thus you have taken away the choice for the user.

If you want to provide two different spellings for the locale fr-FR, e.g. 'French Reformed' and 'French Classic' you have to do so by providing them in separate extensions! That way the user can explicitly choose which type of spelling he likes to use.

Further readings and notes

Uploading and installing extensions

You can browse all available extensions here.

For a list of currently available dictionary extensions in the repository just check here

Extensions should be created as oxt files and uploaded to the repository. Only that way checking for available updates will happen automatically. You can upload your extension here

Integration of a dictionary extension to the installation set

See Spellchecker_Integration_into_Installation_Set for how to integrate a dictionary to the installation set.

Personal tools