Difference between revisions of "Documentation/DevGuide/OfficeDev/Configuring a Filter in OpenOffice.org"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Structure of the configuration)
 
(14 intermediate revisions by 5 users not shown)
Line 7: Line 7:
 
|NextPage=Documentation/DevGuide/OfficeDev/Properties of a Type
 
|NextPage=Documentation/DevGuide/OfficeDev/Properties of a Type
 
}}
 
}}
{{DISPLAYTITLE:Configuring a Filter in {{PRODUCTNAME}}}}
+
{{Documentation/DevGuideLanguages|Documentation/DevGuide/OfficeDev/{{SUBPAGENAME}}}}  
As previously discussed, the whole process of loading and saving content works generically in many components and can be adapted to the needs of a user through the addition of custom modules or the removal of others. All this information about services and parameters are organized in a special configuration branch of {{PRODUCTNAME}} called ''org.openoffice.Office.TypeDetection''. The principal structure is shown below:
+
{{DISPLAYTITLE:Type Detection and its Configuration}}
 +
 
 +
===Structure of the configuration===
 +
 
 +
As described previously, detecting types and finding filters in {{AOo}} is carried out by the <idl>com.sun.star.document.TypeDetection</idl> service that uses configuration data as input. The configuration node that contains all this information is org.openoffice.TypeDetection. Here's the basic structure of it:
  
 
[[Image:typedetection.png|none|thumb|400px|Structure of org.openoffice.Office.TypeDetection Configuration Branch]]
 
[[Image:typedetection.png|none|thumb|400px|Structure of org.openoffice.Office.TypeDetection Configuration Branch]]
  
As shown on the left, the file consists of lists called sets. The list items are described by the structures shown on the right to which the arrows point. It works similar to 1:n relations in a database. Every filter, frame loader, detector is registered for one or multiple types. The detection of the proper type is important for the functionality of the whole system. If the right loader or filter cannot be found, the load or save request does not produce the right results.
+
As shown on the left, the node consists of structures that in the terminology of the Configuration Manager are called sets. As opposed to configuration lists, sets are extendable configuration nodes and this allows the Configuration Manager to merge several files containing the same node together and presenting all set elements found in any of the merged files as part of a common set. This is different to lists: if the same list is found in several configuration files, one of them will overwrite the others. The ability to merge configuration nodes enables the deployment of filter configuration data (and so the deployment of filters) in extensions. Without it all filter configuration data had to be defined in the {{AOo}} installation.
  
To extend {{PRODUCTNAME}} to load or save new content formats, a new type entry is added describing the new content. Furthermore, a filter item is registered for this new type. An optional and recommended change for a detector can be done.
+
There are three lists: types, filters and frame loaders. A type describes a content, while filters or frame loaders describe objects that can be used to load such content into an {{AOo}} document. Arrows in the picture point to structures on the right side. They show the content (properties) of different list elements. Similar to 1:n relations in a database, every filter or frame loader is registered for one or multiple types.  
  
{{Documentation/Caution|It is not a good idea to edit the configuration branch files directly to make these changes. It is better to use the configuration API to do so, because the format of the file may be changed in the future. The properties describing the components, such as types and filters, are always the same and are not likely to be changed or in an incompatible manner. It is better to add entries by specifying their properties using the API only. To make this easier for external programmers, this manual provides a {{PRODUCTNAME}} Basic script that is used for that purpose called ''regfilter.bas''.
+
{{Warn|If you want to add filters to the configuration, it is not a good idea to edit the installed configuration files of {{AOo}} directly. It would be better to provide the data as an extension and install this extension for a single or all users.
 
+
}}
The work to be done by the filter programmer is to provide an ini file that includes the properties and start the basic script inside {{PRODUCTNAME}}. The script reads the file and uses it to change the configuration package. These changes are done for the user layer of the configuration, so it is possible to restore the original state. There is also an example ini file in the samples folder for this manual that can be used for your own purposes called ''regfilter.ini''.}}
+
  
 
=== TypeDetection ===
 
=== TypeDetection ===
  
Every content to be loaded must be specified, that is, the type of content represented in the {{PRODUCTNAME}} must be well known in {{PRODUCTNAME}}. The type is usually document type,.however, the results of active contents, for example, macros, or database contents are also described here.
+
Before the properties of types, filters and frame loaders will be described in close detail, let's have a look on how the Type Detection uses them to detect types and filters. The <idl>com.sun.star.document.TypeDetection</idl> service can be used to just detect the type of a particular content. While a type is detected, it is possible that some information about a possible filter for that type already may have accrued. In case the TypeDetection is part of a loading process where not only a type but also a filter needs to be detected, this suggestion can be used to save an extra filter detection step. This detection otherwise had to be carried out by the generic frame loader by accessing the filter configuration data through the <idl>com.sun.star.document.FilterFactory</idl> service.
 
+
A special service <idl>com.sun.star.document.TypeDetection</idl> is used to accomplish this. It provides an API to associate, for example, a URL or a stream with the extensions well known to {{PRODUCTNAME}}, MIME types or clipboard formats. The resulting value is an internal unique type name used for further operations by using other services, for example, <idl>com.sun.star.frame.FrameLoaderFactory</idl>. This type name can be a part of the already mentioned <code>MediaDescriptor</code>.
+
 
+
It is not necessary or useful to replace this service by custom implementations.,It works in a generic method on top of a special configuration. Extending the type detection is done by changing the configuration and is described later. It is required to make these changes if new content formats are provided for [{{PRODUCTNAME}}, because this is the reason to integrate custom filters into the product.
+
 
+
The <code>TypeDetection</code> also employs the <idl>com.sun.star.document.ExtendedTypeDetection</idl> that examines the given resource and confirms the unique type name determined by <code>TypeDetection</code>. The <code>MediaDescriptor</code> is updated, if necessary, and a unique type name is returned.
+
 
+
=== ExtendedTypeDetection ===
+
 
+
Based on the registered types, flat detection is already possible, that is,. the assignment of types, for example, to a URL, on the basis of configuration data only. Tlat detection cannot always get a correct result if you imagine someone modifying the file extension of a text document from .odt to .txt.. To ensure correct results, we need deep detection, that is, the content has to be examined. The <idl>com.sun.star.document.ExtendedTypeDetection</idl> service performs this task. It is called detector. It gets all the information collected on a document and decides the type to assign it to. In the new modular type detection, the detector is meant as a UNO service that registers itself in the {{PRODUCTNAME}} and is requested by the generic <code>TypeDetection</code> mechanism, if necessary.
+
  
To extend the list of the known content types of {{PRODUCTNAME}}, we suggest implementing a detector component in addition to a filter. It improves the generic detection of {{PRODUCTNAME}} and makes the results more secure.
+
When the TypeDetection receives a URL or a [[Documentation/DevGuide/OfficeDev/Handling_Documents#MediaDescriptor|MediaDescriptor]], it will first check some "external" attributes of the content specified this way. This could be a file extension, a URL pattern or other properties in the MediaDescriptor. If the MediaDescriptor does not already contain the name of the content type, the best match of the data in the "Types" part of the TypeDetection configuration to these attributes is sought. See the chapter about [[Documentation/DevGuide/OfficeDev/Properties_of_a_Type|the type properties]] what kind of attributes are available and how they are used.
  
Inside {{PRODUCTNAME}}, a detector service is called with an already opened stream that is used to find out the content type. In case no stream is given, it indicates that someone else uses this service, for example, outside {{PRODUCTNAME}}). It is then allowed to open your own stream by using the URL part of the <code>MediaDescriptor</code>. If the resulting stream is seekable, it should be set inside the descriptor after its position is reset to 0. If the stream is not seekable, it is not allowed to set it. Please follow the already mentioned rules for handling streams.  
+
If a type has been detected based on these attributes, {{AOo}} can verify this detection based on real code that checks the content, not only its external attributes. For this purpose each type may have an attribute "DetectService". It is an implementation or service name of an object that implements the abstract service <idl>com.sun.star.document.ExtendedTypeDetection</idl>. This object will examine the content. It will get a MediaDescriptor containing the name of the type to confirm and it will return this name in case it matches the content. It is allowed to return another type name if the DetectService knows that this type matches better even if the external attributes may not have selected it in the first place.  
  
{{PDL1}}
+
If the external attributes didn't help {{AOo}} to find a type, it will instantiate all registered DetectServices and ask them to check the content until any of them returns a valid type name. The called DetectService can detect that it is called for "guessing", not for confirmation as in this case no type name is passed to it in the MediaDescriptor.
  
[[Category:Documentation/Developer's Guide/Office Development]]
+
The next step is to check if a frame loader is registered for the detected type. If no frame loader is found, the generic frame loader implementation of {{AOo}} is used. As mentioned above, this service will detect a filter in case the TypeDetection service not already has given this information. This detection is easily done by using a filter query at the <idl>com.sun.star.document.FilterFactory</idl> service. This query encapsulates the algorithm how {{AOo}} assigns a filter to a type. The result of this query will be the internal filter name of the desired filter and the FilterFactory then can be asked to create the filter. Note: filter queries can return more than one filter name, depending on the input. If no preferences have been given, the first one in the returned sequence will win.
  
 +
The most important external attribute of a content is a file extension and often just this one is used. As these extensions don't need to be unique, {{AOo}} may find several possible types for an extension. While there is a preferred type (or at least there should be one), it is possible for API programmers to override this by a type preselection. It is also possible to use a filter preselection or a document type preselection. The latter can be seen as a suggestion to {{AOo}} to load a content with a particular {{AOo}} application. If this is possible, {{AOo}} will do that, otherwise it will proceed as usual. One of the most common use cases is to load a html file by Calc from the command line. By using "soffice -Calc $FILENAME" instead of just "soffice $FILENAME" a document type preselection is triggered.
  
 
{{PDL1}}
 
{{PDL1}}
  
 
[[Category:Documentation/Developer's Guide/Office Development]]
 
[[Category:Documentation/Developer's Guide/Office Development]]

Latest revision as of 14:36, 9 August 2021



Structure of the configuration

As described previously, detecting types and finding filters in Apache OpenOffice is carried out by the com.sun.star.document.TypeDetection service that uses configuration data as input. The configuration node that contains all this information is org.openoffice.TypeDetection. Here's the basic structure of it:

Structure of org.openoffice.Office.TypeDetection Configuration Branch

As shown on the left, the node consists of structures that in the terminology of the Configuration Manager are called sets. As opposed to configuration lists, sets are extendable configuration nodes and this allows the Configuration Manager to merge several files containing the same node together and presenting all set elements found in any of the merged files as part of a common set. This is different to lists: if the same list is found in several configuration files, one of them will overwrite the others. The ability to merge configuration nodes enables the deployment of filter configuration data (and so the deployment of filters) in extensions. Without it all filter configuration data had to be defined in the Apache OpenOffice installation.

There are three lists: types, filters and frame loaders. A type describes a content, while filters or frame loaders describe objects that can be used to load such content into an Apache OpenOffice document. Arrows in the picture point to structures on the right side. They show the content (properties) of different list elements. Similar to 1:n relations in a database, every filter or frame loader is registered for one or multiple types.

Documentation caution.png If you want to add filters to the configuration, it is not a good idea to edit the installed configuration files of Apache OpenOffice directly. It would be better to provide the data as an extension and install this extension for a single or all users.

TypeDetection

Before the properties of types, filters and frame loaders will be described in close detail, let's have a look on how the Type Detection uses them to detect types and filters. The com.sun.star.document.TypeDetection service can be used to just detect the type of a particular content. While a type is detected, it is possible that some information about a possible filter for that type already may have accrued. In case the TypeDetection is part of a loading process where not only a type but also a filter needs to be detected, this suggestion can be used to save an extra filter detection step. This detection otherwise had to be carried out by the generic frame loader by accessing the filter configuration data through the com.sun.star.document.FilterFactory service.

When the TypeDetection receives a URL or a MediaDescriptor, it will first check some "external" attributes of the content specified this way. This could be a file extension, a URL pattern or other properties in the MediaDescriptor. If the MediaDescriptor does not already contain the name of the content type, the best match of the data in the "Types" part of the TypeDetection configuration to these attributes is sought. See the chapter about the type properties what kind of attributes are available and how they are used.

If a type has been detected based on these attributes, Apache OpenOffice can verify this detection based on real code that checks the content, not only its external attributes. For this purpose each type may have an attribute "DetectService". It is an implementation or service name of an object that implements the abstract service com.sun.star.document.ExtendedTypeDetection. This object will examine the content. It will get a MediaDescriptor containing the name of the type to confirm and it will return this name in case it matches the content. It is allowed to return another type name if the DetectService knows that this type matches better even if the external attributes may not have selected it in the first place.

If the external attributes didn't help Apache OpenOffice to find a type, it will instantiate all registered DetectServices and ask them to check the content until any of them returns a valid type name. The called DetectService can detect that it is called for "guessing", not for confirmation as in this case no type name is passed to it in the MediaDescriptor.

The next step is to check if a frame loader is registered for the detected type. If no frame loader is found, the generic frame loader implementation of Apache OpenOffice is used. As mentioned above, this service will detect a filter in case the TypeDetection service not already has given this information. This detection is easily done by using a filter query at the com.sun.star.document.FilterFactory service. This query encapsulates the algorithm how Apache OpenOffice assigns a filter to a type. The result of this query will be the internal filter name of the desired filter and the FilterFactory then can be asked to create the filter. Note: filter queries can return more than one filter name, depending on the input. If no preferences have been given, the first one in the returned sequence will win.

The most important external attribute of a content is a file extension and often just this one is used. As these extensions don't need to be unique, Apache OpenOffice may find several possible types for an extension. While there is a preferred type (or at least there should be one), it is possible for API programmers to override this by a type preselection. It is also possible to use a filter preselection or a document type preselection. The latter can be seen as a suggestion to Apache OpenOffice to load a content with a particular Apache OpenOffice application. If this is possible, Apache OpenOffice will do that, otherwise it will proceed as usual. One of the most common use cases is to load a html file by Calc from the command line. By using "soffice -Calc $FILENAME" instead of just "soffice $FILENAME" a document type preselection is triggered.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages