Difference between revisions of "Documentation/DevGuide/OfficeDev/Configuring a Filter in OpenOffice.org"

From Apache OpenOffice Wiki
Jump to: navigation, search
(TypeDetection)
Line 24: Line 24:
 
=== TypeDetection ===
 
=== TypeDetection ===
  
Every content to be loaded must be specified, that is, the type of content represented in the {{PRODUCTNAME}} must be well known in {{PRODUCTNAME}}. The type is usually document type,.however, the results of active contents, for example, macros, or database contents are also described here.
+
Before the properties of types, filters and frame loaders will be desribed in close detail, let's have a look on how the Type Detecion uses them to detect types and filters. The <idl>com.sun.star.document.TypeDetection</idl> service can be used to just detect the type of a particular content. While a type is detected, it is possible that some information about a possible filter for that type already may have accrued. In case the TypeDetection is part of a loading process where not only a type but also a filter needs to be detected, this suggestion can be used to save an extra filter detection step. This detection otherwise had to be carried out by the generic frame loader by accessing the filter configuration data through the <idl>com.sun.star.documen.FilterFactory</idl> service.
  
A special service <idl>com.sun.star.document.TypeDetection</idl> is used to accomplish this. It provides an API to associate, for example, a URL or a stream with the extensions well known to {{PRODUCTNAME}}, MIME types or clipboard formats. The resulting value is an internal unique type name used for further operations by using other services, for example, <idl>com.sun.star.frame.FrameLoaderFactory</idl>. This type name can be a part of the already mentioned <code>MediaDescriptor</code>.
+
When the TypeDetection receives a URL or a [[Documentation/DevGuide/OfficeDev/Handling_Documents#MediaDescriptor|MediaDescriptor]], it will first check some "external" attributes of the content specified this way. This could be a file extension, a URL pattern or other properties in the MediaDescriptor. If the MediaDescriptor not already contains the name of the content type, the best match of the data in the "Types" part of the TypeDetection configuration to these attributes is sought. See the chapter about [[Documentation/DevGuide/OfficeDev/Properties_of_a_Type|the type properties]] what kind of attributes are available and how they are used.
  
It is not necessary or useful to replace this service by custom implementations.,It works in a generic method on top of a special configuration. Extending the type detection is done by changing the configuration and is described later. It is required to make these changes if new content formats are provided for [{{PRODUCTNAME}}, because this is the reason to integrate custom filters into the product.
+
If a type has been detected based on these attributes, {{PRODUCTNAME}} can verify this detection based on real code that checks the content, not only its external attributes. For this purpose each type may have an attribute "DetectService". It is an implementation or service name of an object that implements the abstract service <idl>com.sun.star.document.ExtendedTypeDetection</idl>. This object will examine the content. It will get a MediaDescriptor containing the name of the type to confirm and it will return this name in case it matches the content. It is allowed to return another type name if the DetectService knows that this type matches better even if the external attributes may not have selected it in the first place.  
  
The <code>TypeDetection</code> also employs the <idl>com.sun.star.document.ExtendedTypeDetection</idl> that examines the given resource and confirms the unique type name determined by <code>TypeDetection</code>. The <code>MediaDescriptor</code> is updated, if necessary, and a unique type name is returned.
+
If the external attributes didn't help {{PRODUCTNAME}} to find a type, it will instantiate all registered DetectServices and ask them to check the content until any of them returns a valid type name. The called DetectService can detect that it is called for "guessing", not for confirmation as in this case no type name is passed to it in the MediaDescriptor.
 +
 
 +
The next step is to check if a frame loader is registered for the detected type. If no frame loader is found, the generic frame loader implementation of {{PRODUCTNAME}} is used. As mentioned above, this service will detect a filter in case the TypeDetection service not already has given this information.
 +
 
 +
The most important external attribute of a content is a file extension and often just this one is used. As these extensions don't need to be unique, {{PRODUCTNAME}} may find several possible types for an extension. While there is a preferred type (or at least there should be one), it is possible for API programmers to override this by a type preselection. It is also possible to use a filter preselection or a document type preselection. The latter can be seen as a suggestion to {{PRODUCTNAME}} to load a content with a particular {{PRODUCTNAME}} application. If this is possible, {{PRODUCTNAME}} will do that, otherwise it will proceed as usual. One of the most common use cases is to load an html file by Calc from the command line. By using "soffice -Calc $FILENAME" instead of just "soffice $FILENAME" a document type preselection is triggered.
  
 
=== ExtendedTypeDetection ===
 
=== ExtendedTypeDetection ===

Revision as of 07:12, 1 October 2008




Structure of the configuration

As described previously, detecting types and finding filters in OpenOffice.org is carried out by the com.sun.star.document.TypeDetection service that uses configuration data as input. The configuration node that contains all this information is org.openoffice.Office.TypeDetection. Here's the basic structure of it:

Structure of org.openoffice.Office.TypeDetection Configuration Branch

As shown on the left, the node consists of structures that in the terminology of the Configuration Manager are called sets. As opposed to configuration lists, sets are extandable configuration nodes and this allows the Configuration Manager to merge several files containing the same node together and presenting all set elements found in any of the merged files as part of a common set. This is different to lists: if the same list is found in several configuration files, one of them will overwrite the others. The ability to merge configuration nodes enables the deployment of filter configuration data (and so the deployment of filters) in extensions. Without it all filter configuration data had to be defined in the OpenOffice.org installation.

There are three lists: types, filters and frame loaders. A type describes a content and filters or frame loaders describe objects that can be used to load such content into an OOo document. Arrows in the picture point to structures on the right side. They show the content (properties) of different the list elements. Similar to 1:n relations in a database, every filter or frame loader is registered for one or multiple types.

Documentation caution.png If you want to add filters to the configuration, it is not a good idea to edit the installed configuration files of OpenOffice.org directly. It would be better to provide the data as an extension and install this extension for a single or all users.

TypeDetection

Before the properties of types, filters and frame loaders will be desribed in close detail, let's have a look on how the Type Detecion uses them to detect types and filters. The com.sun.star.document.TypeDetection service can be used to just detect the type of a particular content. While a type is detected, it is possible that some information about a possible filter for that type already may have accrued. In case the TypeDetection is part of a loading process where not only a type but also a filter needs to be detected, this suggestion can be used to save an extra filter detection step. This detection otherwise had to be carried out by the generic frame loader by accessing the filter configuration data through the com.sun.star.documen.FilterFactory service.

When the TypeDetection receives a URL or a MediaDescriptor, it will first check some "external" attributes of the content specified this way. This could be a file extension, a URL pattern or other properties in the MediaDescriptor. If the MediaDescriptor not already contains the name of the content type, the best match of the data in the "Types" part of the TypeDetection configuration to these attributes is sought. See the chapter about the type properties what kind of attributes are available and how they are used.

If a type has been detected based on these attributes, OpenOffice.org can verify this detection based on real code that checks the content, not only its external attributes. For this purpose each type may have an attribute "DetectService". It is an implementation or service name of an object that implements the abstract service com.sun.star.document.ExtendedTypeDetection. This object will examine the content. It will get a MediaDescriptor containing the name of the type to confirm and it will return this name in case it matches the content. It is allowed to return another type name if the DetectService knows that this type matches better even if the external attributes may not have selected it in the first place.

If the external attributes didn't help OpenOffice.org to find a type, it will instantiate all registered DetectServices and ask them to check the content until any of them returns a valid type name. The called DetectService can detect that it is called for "guessing", not for confirmation as in this case no type name is passed to it in the MediaDescriptor.

The next step is to check if a frame loader is registered for the detected type. If no frame loader is found, the generic frame loader implementation of OpenOffice.org is used. As mentioned above, this service will detect a filter in case the TypeDetection service not already has given this information.

The most important external attribute of a content is a file extension and often just this one is used. As these extensions don't need to be unique, OpenOffice.org may find several possible types for an extension. While there is a preferred type (or at least there should be one), it is possible for API programmers to override this by a type preselection. It is also possible to use a filter preselection or a document type preselection. The latter can be seen as a suggestion to OpenOffice.org to load a content with a particular OpenOffice.org application. If this is possible, OpenOffice.org will do that, otherwise it will proceed as usual. One of the most common use cases is to load an html file by Calc from the command line. By using "soffice -Calc $FILENAME" instead of just "soffice $FILENAME" a document type preselection is triggered.

ExtendedTypeDetection

Based on the registered types, flat detection is already possible, that is,. the assignment of types, for example, to a URL, on the basis of configuration data only. Tlat detection cannot always get a correct result if you imagine someone modifying the file extension of a text document from .odt to .txt.. To ensure correct results, we need deep detection, that is, the content has to be examined. The com.sun.star.document.ExtendedTypeDetection service performs this task. It is called detector. It gets all the information collected on a document and decides the type to assign it to. In the new modular type detection, the detector is meant as a UNO service that registers itself in the OpenOffice.org and is requested by the generic TypeDetection mechanism, if necessary.

To extend the list of the known content types of OpenOffice.org, we suggest implementing a detector component in addition to a filter. It improves the generic detection of OpenOffice.org and makes the results more secure.

Inside OpenOffice.org, a detector service is called with an already opened stream that is used to find out the content type. In case no stream is given, it indicates that someone else uses this service, for example, outside OpenOffice.org). It is then allowed to open your own stream by using the URL part of the MediaDescriptor. If the resulting stream is seekable, it should be set inside the descriptor after its position is reset to 0. If the stream is not seekable, it is not allowed to set it. Please follow the already mentioned rules for handling streams.

Content on this page is licensed under the Public Documentation License (PDL).


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools