Type Detection and its Configuration

From Apache OpenOffice Wiki
Jump to: navigation, search

Structure of the configuration

As described previously, detecting types and finding filters in OpenOffice.org is carried out by the com.sun.star.document.TypeDetection service that uses configuration data as input. The configuration node that contains all this information is org.openoffice.TypeDetection. Here's the basic structure of it:

Structure of org.openoffice.Office.TypeDetection Configuration Branch

As shown on the left, the node consists of structures that in the terminology of the Configuration Manager are called sets. As opposed to configuration lists, sets are extendable configuration nodes and this allows the Configuration Manager to merge several files containing the same node together and presenting all set elements found in any of the merged files as part of a common set. This is different to lists: if the same list is found in several configuration files, one of them will overwrite the others. The ability to merge configuration nodes enables the deployment of filter configuration data (and so the deployment of filters) in extensions. Without it all filter configuration data had to be defined in the OpenOffice.org installation.

There are three lists: types, filters and frame loaders. A type describes a content, while filters or frame loaders describe objects that can be used to load such content into an OOo document. Arrows in the picture point to structures on the right side. They show the content (properties) of different list elements. Similar to 1:n relations in a database, every filter or frame loader is registered for one or multiple types.

Documentation caution.png If you want to add filters to the configuration, it is not a good idea to edit the installed configuration files of OpenOffice.org directly. It would be better to provide the data as an extension and install this extension for a single or all users.


Before the properties of types, filters and frame loaders will be desribed in close detail, let's have a look on how the Type Detecion uses them to detect types and filters. The com.sun.star.document.TypeDetection service can be used to just detect the type of a particular content. While a type is detected, it is possible that some information about a possible filter for that type already may have accrued. In case the TypeDetection is part of a loading process where not only a type but also a filter needs to be detected, this suggestion can be used to save an extra filter detection step. This detection otherwise had to be carried out by the generic frame loader by accessing the filter configuration data through the com.sun.star.documen.FilterFactory service.

When the TypeDetection receives a URL or a MediaDescriptor, it will first check some "external" attributes of the content specified this way. This could be a file extension, a URL pattern or other properties in the MediaDescriptor. If the MediaDescriptor does not already contain the name of the content type, the best match of the data in the "Types" part of the TypeDetection configuration to these attributes is sought. See the chapter about the type properties what kind of attributes are available and how they are used.

If a type has been detected based on these attributes, OpenOffice.org can verify this detection based on real code that checks the content, not only its external attributes. For this purpose each type may have an attribute "DetectService". It is an implementation or service name of an object that implements the abstract service com.sun.star.document.ExtendedTypeDetection. This object will examine the content. It will get a MediaDescriptor containing the name of the type to confirm and it will return this name in case it matches the content. It is allowed to return another type name if the DetectService knows that this type matches better even if the external attributes may not have selected it in the first place.

If the external attributes didn't help OpenOffice.org to find a type, it will instantiate all registered DetectServices and ask them to check the content until any of them returns a valid type name. The called DetectService can detect that it is called for "guessing", not for confirmation as in this case no type name is passed to it in the MediaDescriptor.

The next step is to check if a frame loader is registered for the detected type. If no frame loader is found, the generic frame loader implementation of OpenOffice.org is used. As mentioned above, this service will detect a filter in case the TypeDetection service not already has given this information. This detection is easily done by using a filter query at the com.sun.star.document.FilterFactory service. This query encapsulates the algorithm how OpenOffice.org assigns a filter to a type. The result of this query will be the internal filter name of the desired filter and the FilterFactory then can be asked to create the filter. Note: filter queries can return more than one filter name, depending on the input. If no preferences have been given, the first one in the returned sequence will win.

The most important external attribute of a content is a file extension and often just this one is used. As these extensions don't need to be unique, OpenOffice.org may find several possible types for an extension. While there is a preferred type (or at least there should be one), it is possible for API programmers to override this by a type preselection. It is also possible to use a filter preselection or a document type preselection. The latter can be seen as a suggestion to OpenOffice.org to load a content with a particular OpenOffice.org application. If this is possible, OpenOffice.org will do that, otherwise it will proceed as usual. One of the most common use cases is to load an html file by Calc from the command line. By using "soffice -Calc $FILENAME" instead of just "soffice $FILENAME" a document type preselection is triggered.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages