Difference between revisions of "Documentation/DevGuide/OfficeDev/XML Filter Detection"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Robot: Changing Category:Documentation/Developers Guide/Office Development)
 
(5 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
|OfficeDevXMLFilter=block
 
|OfficeDevXMLFilter=block
 
|ShowPrevNext=block
 
|ShowPrevNext=block
|PrevPage=Documentation/DevGuide/OfficeDev/Additional Components
+
|PrevPage=Documentation/DevGuide/OfficeDev/The Exporter
 
|NextPage=Documentation/DevGuide/OfficeDev/Number Formats
 
|NextPage=Documentation/DevGuide/OfficeDev/Number Formats
 
}}
 
}}
{{DISPLAYTITLE:XML Filter Detection}}
+
{{Documentation/DevGuideLanguages|Documentation/DevGuide/OfficeDev/{{SUBPAGENAME}}}}
 +
{{DISPLAYTITLE:XML Filter Detection}}
 
__NOTOC__
 
__NOTOC__
 
<!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>-->
 
<!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>-->
The number of XML files that conform to differing DTD specifications means that a single filter and file type definition is insufficient to handle all of the possible formats available. In order to allow {{PRODUCTNAME}} to handle multiple filter definitions and implementations, it is necessary to implement an additional filter detection module that is capable of determining the type of XML file being read, based on its <code>DocType</code> declaration.
+
The number of XML files that conform to differing DTD specifications means that a single filter and file type definition is insufficient to handle all of the possible formats available. In order to allow {{AOo}} to handle multiple filter definitions and implementations, it is necessary to implement an additional filter detection module that is capable of determining the type of XML file being read, based on its <code>DocType</code> declaration.
  
 
To accomplish this, a filter detection service <idl>com.sun.star.document.ExtendedTypeDetection</idl> can be implemented, which is capable of handling and distinguishing between many different XML based file formats. This type of service supersedes the basic ''flat'' detection, which uses the file's suffix to determine the Type, and instead, carries out a ''deep'' detection which uses the file's internal structure and content to detect its true type.
 
To accomplish this, a filter detection service <idl>com.sun.star.document.ExtendedTypeDetection</idl> can be implemented, which is capable of handling and distinguishing between many different XML based file formats. This type of service supersedes the basic ''flat'' detection, which uses the file's suffix to determine the Type, and instead, carries out a ''deep'' detection which uses the file's internal structure and content to detect its true type.
Line 24: Line 25:
 
=== Extending the File Type Definition ===
 
=== Extending the File Type Definition ===
  
Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the <code>DocType</code> information can be contained as part of the file type definition. This information is held as part of the <code>ClipboardFormat</code> property of the type node. A unique namespace or preface identifies the String at this point in the sequence as being a <code>DocType</code> declaration.
+
Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the <code>DocType</code> information can be contained as part of the file type definition. This information is held as part of the <code>ClipboardFormat</code> property of the type node. A unique namespace or preface identifies the string at this point in the sequence as being a <code>DocType</code> declaration.
  
 
==== Sample Type definition: ====
 
==== Sample Type definition: ====
<source lang="xml">
+
<syntaxhighlight lang="xml">
 
   <node oor:name="writer_DocBook_File" oor:op="replace">
 
   <node oor:name="writer_DocBook_File" oor:op="replace">
 
       <prop oor:name="UIName">
 
       <prop oor:name="UIName">
Line 33: Line 34:
 
       </prop>
 
       </prop>
 
       <prop oor:name="Data">
 
       <prop oor:name="Data">
           <value> 0,
+
           <value>0,,doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,,XML,20002,</value>
                  ,
+
                  doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,
+
                  ,
+
                  XML,
+
                  20002,  
+
          </value>
+
 
       </prop>
 
       </prop>
 
   </node>
 
   </node>
</source>
+
</syntaxhighlight>
 
=== The ExtendedTypeDetection Service Implementation ===
 
=== The ExtendedTypeDetection Service Implementation ===
  
 
In order for the type detection code to function as an <code>ExtendedTypeDetection</code> service, you must implement the <code>detect()</code> method as defined by the <idl>com.sun.star.document.XExtendedFilterDetection</idl> interface definition:
 
In order for the type detection code to function as an <code>ExtendedTypeDetection</code> service, you must implement the <code>detect()</code> method as defined by the <idl>com.sun.star.document.XExtendedFilterDetection</idl> interface definition:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );
 
   string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );
</source>
+
</syntaxhighlight>
This method supplies you with a sequence of <code>ProptertyValues</code> from which you can use to extract the current <code>TypeName</code> and the <code>URL</code> of the file being loaded:
+
This method supplies you with a sequence of <code>PropertyValue</code>s from which you can use to extract the current <code>TypeName</code> and the <code>URL</code> of the file being loaded:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException)  
 
   ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException)  
 
   {
 
   {
Line 64: Line 59:
 
           }
 
           }
 
   }
 
   }
</source>
+
</syntaxhighlight>
 
Once you have the URL of the file, you can then use it to create a <code>::ucb::Content</code> from which you can open an <code>XInputStream</code> to the file:
 
Once you have the URL of the file, you can then use it to create a <code>::ucb::Content</code> from which you can open an <code>XInputStream</code> to the file:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   Reference< com::sun::star::ucb::XCommandEnvironment > xEnv;
 
   Reference< com::sun::star::ucb::XCommandEnvironment > xEnv;
 
   ::ucb::Content aContent(sUrl,xEnv);
 
   ::ucb::Content aContent(sUrl,xEnv);
 
   xInStream = aContent.openStream();
 
   xInStream = aContent.openStream();
</source>
+
</syntaxhighlight>
 
You can now use this <code>XInputStream</code> to read the header of the file being loaded. Because the exact location of the <code>DocType</code> information within the file is not known, the first 1000 bytes of information will be read:
 
You can now use this <code>XInputStream</code> to read the header of the file being loaded. Because the exact location of the <code>DocType</code> information within the file is not known, the first 1000 bytes of information will be read:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   ::rtl::OString resultString;
 
   ::rtl::OString resultString;
 
   com::sun::star::uno::Sequence< sal_Int8 > aData;
 
   com::sun::star::uno::Sequence< sal_Int8 > aData;
   long bytestRead =xInStream->readBytes (aData, 1000);
+
   long bytestRead = xInStream->readBytes (aData, 1000);
   resultString=::rtl::OString(
+
   resultString = ::rtl::OString( (const sal_Char *)aData.getConstArray(), bytestRead);
  (const sal_Char *)aData.getConstArray(),bytestRead) ;
+
</syntaxhighlight>
</source>
+
 
Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:
 
Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   Reference <XNameAccess> xTypeCont(mxMSF->createInstance(OUString::createFromAscii(
 
   Reference <XNameAccess> xTypeCont(mxMSF->createInstance(OUString::createFromAscii(
 
                                   "com.sun.star.document.TypeDetection" )),UNO_QUERY);
 
                                   "com.sun.star.document.TypeDetection" )),UNO_QUERY);
 
   Sequence <::rtl::OUString> myTypes= xTypeCont->getElementNames();
 
   Sequence <::rtl::OUString> myTypes= xTypeCont->getElementNames();
 
   nLength = myTypes.getLength();
 
   nLength = myTypes.getLength();
</source>
+
</syntaxhighlight>
 
For each of these types, you must first determine whether the <code>ClipboardFormat</code> property contains a DocType:
 
For each of these types, you must first determine whether the <code>ClipboardFormat</code> property contains a DocType:
<source lang="cpp">
+
<syntaxhighlight lang="cpp">
 
   Loc_of_ClipboardFormat=...;
 
   Loc_of_ClipboardFormat=...;
 
   Sequence<::rtl::OUString> ClipboardFormatSeq;
 
   Sequence<::rtl::OUString> ClipboardFormatSeq;
Line 96: Line 90:
 
           }
 
           }
 
   }
 
   }
</source>
+
</syntaxhighlight>
All of the possible <code>DocType</code> declarations of the file types can be checked to determine a match. If a match is found, the type corresponding to the match is returned. If no match is found, an empty string is returned. This will force {{PRODUCTNAME}} into flat detection mode.
+
All the possible <code>DocType</code> declarations of the file types can be checked to determine a match. If a match is found, the type corresponding to the match is returned. If no match is found, an empty string is returned. This will force {{AOo}} into flat detection mode.
  
 
=== TypeDetection.xcu DetectServices Entry ===
 
=== TypeDetection.xcu DetectServices Entry ===
  
Now that you have created the <code>ExtendedTypeDetection</code> service implementation, you need to tell {{PRODUCTNAME}} when to use this service.
+
Now that you have created the <code>ExtendedTypeDetection</code> service implementation, you need to tell {{AOo}} when to use this service.
  
First create a <code>DetectServices</code> node, unless one already exists, and then add the information specific to the detection servicethat has been implemented, that is, the name of the service and the file types that use it.
+
First create a <code>DetectServices</code> node, unless one already exists, and then add the information specific to the detection service that has been implemented, that is, the name of the service and the file types that use it.
<source lang="xml">
+
<syntaxhighlight lang="xml">
 
   <node oor:name="DetectServices">
 
   <node oor:name="DetectServices">
 
   <node oor:name="com.sun.star.comp.filters.XMLDetect" oor:op="replace">
 
   <node oor:name="com.sun.star.comp.filters.XMLDetect" oor:op="replace">
Line 116: Line 110:
 
   </node>
 
   </node>
 
   </node>
 
   </node>
</source>
+
</syntaxhighlight>
 
{{PDL1}}
 
{{PDL1}}
  
 
[[Category:Documentation/Developer's Guide/Office Development]]
 
[[Category:Documentation/Developer's Guide/Office Development]]

Latest revision as of 12:31, 3 January 2021



The number of XML files that conform to differing DTD specifications means that a single filter and file type definition is insufficient to handle all of the possible formats available. In order to allow Apache OpenOffice to handle multiple filter definitions and implementations, it is necessary to implement an additional filter detection module that is capable of determining the type of XML file being read, based on its DocType declaration.

To accomplish this, a filter detection service com.sun.star.document.ExtendedTypeDetection can be implemented, which is capable of handling and distinguishing between many different XML based file formats. This type of service supersedes the basic flat detection, which uses the file's suffix to determine the Type, and instead, carries out a deep detection which uses the file's internal structure and content to detect its true type.

Requirements for Deep Detection

There are three requirements for implementing a deep detection module that is capable of identifying one or more unique XML types. These include:

  • An extended type definition for describing the format in more detail (TypeDetection.xcu).
  • A DetectService implementation.
  • A DetectService definition (TypeDetection.xcu).

Extending the File Type Definition

Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the DocType information can be contained as part of the file type definition. This information is held as part of the ClipboardFormat property of the type node. A unique namespace or preface identifies the string at this point in the sequence as being a DocType declaration.

Sample Type definition:

  <node oor:name="writer_DocBook_File" oor:op="replace">
      <prop oor:name="UIName">
          <value XML:lang="en-US">DocBook</value>
      </prop>
      <prop oor:name="Data">
          <value>0,,doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,,XML,20002,</value>
      </prop>
  </node>

The ExtendedTypeDetection Service Implementation

In order for the type detection code to function as an ExtendedTypeDetection service, you must implement the detect() method as defined by the com.sun.star.document.XExtendedFilterDetection interface definition:

  string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );

This method supplies you with a sequence of PropertyValues from which you can use to extract the current TypeName and the URL of the file being loaded:

  ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException) 
  {
  const PropertyValue * pValue = aArguments.getConstArray();
  sal_Int32 nLength;
  ::rtl::OString resultString;
  nLength = aArguments.getLength();
  for (sal_Int32 i = 0; i < nLength; i++) {
          if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("TypeName"))) {
          }
          else if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("URL"))) {
                  pValue[i].Value >>= sUrl;
          }
  }

Once you have the URL of the file, you can then use it to create a ::ucb::Content from which you can open an XInputStream to the file:

  Reference< com::sun::star::ucb::XCommandEnvironment > xEnv;
  ::ucb::Content aContent(sUrl,xEnv);
  xInStream = aContent.openStream();

You can now use this XInputStream to read the header of the file being loaded. Because the exact location of the DocType information within the file is not known, the first 1000 bytes of information will be read:

  ::rtl::OString resultString;
  com::sun::star::uno::Sequence< sal_Int8 > aData;
  long bytestRead = xInStream->readBytes (aData, 1000);
  resultString = ::rtl::OString( (const sal_Char *)aData.getConstArray(), bytestRead);

Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:

  Reference <XNameAccess> xTypeCont(mxMSF->createInstance(OUString::createFromAscii(
                                  "com.sun.star.document.TypeDetection" )),UNO_QUERY);
  Sequence <::rtl::OUString> myTypes= xTypeCont->getElementNames();
  nLength = myTypes.getLength();

For each of these types, you must first determine whether the ClipboardFormat property contains a DocType:

  Loc_of_ClipboardFormat=...;
  Sequence<::rtl::OUString> ClipboardFormatSeq;
  Type_Props[Loc_of_ClipboardFormat].Value >>=ClipboardFormatSeq ;
  while() {
          if(ClipboardFormatSeq.match(OUString::createFromAscii("doctype:") {
                      //if it contains a DocType, start to compare to header
          }
  }

All the possible DocType declarations of the file types can be checked to determine a match. If a match is found, the type corresponding to the match is returned. If no match is found, an empty string is returned. This will force Apache OpenOffice into flat detection mode.

TypeDetection.xcu DetectServices Entry

Now that you have created the ExtendedTypeDetection service implementation, you need to tell Apache OpenOffice when to use this service.

First create a DetectServices node, unless one already exists, and then add the information specific to the detection service that has been implemented, that is, the name of the service and the file types that use it.

  <node oor:name="DetectServices">
  <node oor:name="com.sun.star.comp.filters.XMLDetect" oor:op="replace">
          <prop oor:name="ServiceName">
                  <value XML:lang="en-US">com.sun.star.comp.filters.XMLDetect</value>
          </prop>
          <prop oor:name="Types">
                  <value>writer_DocBook_File</value>
                  <value>writer_Flat_XML_File</value>
          </prop>
  </node>
  </node>
Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages