Difference between revisions of "Documentation/DevGuide/OfficeDev/XML Filter Detection"

From Apache OpenOffice Wiki
Jump to: navigation, search
m
(i21766 Data property can not contain new line)
(6 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
|OfficeDevXMLFilter=block
 
|OfficeDevXMLFilter=block
 
|ShowPrevNext=block
 
|ShowPrevNext=block
|PrevPage=Documentation/DevGuide/OfficeDev/Additional Components
+
|PrevPage=Documentation/DevGuide/OfficeDev/The Exporter
 
|NextPage=Documentation/DevGuide/OfficeDev/Number Formats
 
|NextPage=Documentation/DevGuide/OfficeDev/Number Formats
 
}}
 
}}
{{DISPLAYTITLE:XML Filter Detection}}
+
{{Documentation/DevGuideLanguages|Documentation/DevGuide/OfficeDev/{{SUBPAGENAME}}}}
 +
{{DISPLAYTITLE:XML Filter Detection}}
 
__NOTOC__
 
__NOTOC__
 
<!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>-->
 
<!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>-->
Line 24: Line 25:
 
=== Extending the File Type Definition ===
 
=== Extending the File Type Definition ===
  
Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the <code>DocType</code> information can be contained as part of the file type definition. This information is held as part of the <code>ClipboardFormat</code> property of the type node. A unique namespace or preface identifies the String at this point in the sequence as being a <code>DocType</code> declaration.
+
Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the <code>DocType</code> information can be contained as part of the file type definition. This information is held as part of the <code>ClipboardFormat</code> property of the type node. A unique namespace or preface identifies the string at this point in the sequence as being a <code>DocType</code> declaration.
  
 
==== Sample Type definition: ====
 
==== Sample Type definition: ====
Line 33: Line 34:
 
       </prop>
 
       </prop>
 
       <prop oor:name="Data">
 
       <prop oor:name="Data">
           <value> 0,
+
           <value>0,,doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,,XML,20002,</value>
                  ,
+
                  doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,
+
                  ,
+
                  XML,
+
                  20002,  
+
          </value>
+
 
       </prop>
 
       </prop>
 
   </node>
 
   </node>
Line 49: Line 44:
 
   string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );
 
   string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );
 
  </source>
 
  </source>
This method supplies you with a sequence of <code>ProptertyValues</code> from which you can use to extract the current <code>TypeName</code> and the <code>URL</code> of the file being loaded:
+
This method supplies you with a sequence of <code>PropertyValue</code>s from which you can use to extract the current <code>TypeName</code> and the <code>URL</code> of the file being loaded:
 
  <source lang="cpp">
 
  <source lang="cpp">
 
   ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException)  
 
   ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException)  
Line 75: Line 70:
 
   ::rtl::OString resultString;
 
   ::rtl::OString resultString;
 
   com::sun::star::uno::Sequence< sal_Int8 > aData;
 
   com::sun::star::uno::Sequence< sal_Int8 > aData;
   long bytestRead =xInStream->readBytes (aData, 1000);
+
   long bytestRead = xInStream->readBytes (aData, 1000);
   resultString=::rtl::OString(
+
   resultString = ::rtl::OString( (const sal_Char *)aData.getConstArray(), bytestRead);
  (const sal_Char *)aData.getConstArray(),bytestRead) ;
+
 
  </source>
 
  </source>
 
Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:
 
Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:
Line 103: Line 97:
 
Now that you have created the <code>ExtendedTypeDetection</code> service implementation, you need to tell {{PRODUCTNAME}} when to use this service.
 
Now that you have created the <code>ExtendedTypeDetection</code> service implementation, you need to tell {{PRODUCTNAME}} when to use this service.
  
First create a <code>DetectServices</code> node, unless one already exists, and then add the information specific to the detection servicethat has been implemented, that is, the name of the service and the file types that use it.
+
First create a <code>DetectServices</code> node, unless one already exists, and then add the information specific to the detection service that has been implemented, that is, the name of the service and the file types that use it.
 
  <source lang="xml">
 
  <source lang="xml">
 
   <node oor:name="DetectServices">
 
   <node oor:name="DetectServices">
Line 118: Line 112:
 
  </source>
 
  </source>
 
{{PDL1}}
 
{{PDL1}}
[[Category: Office Development]]
+
 
 +
[[Category:Documentation/Developer's Guide/Office Development]]

Revision as of 15:06, 11 February 2014



The number of XML files that conform to differing DTD specifications means that a single filter and file type definition is insufficient to handle all of the possible formats available. In order to allow OpenOffice.org to handle multiple filter definitions and implementations, it is necessary to implement an additional filter detection module that is capable of determining the type of XML file being read, based on its DocType declaration.

To accomplish this, a filter detection service com.sun.star.document.ExtendedTypeDetection can be implemented, which is capable of handling and distinguishing between many different XML based file formats. This type of service supersedes the basic flat detection, which uses the file's suffix to determine the Type, and instead, carries out a deep detection which uses the file's internal structure and content to detect its true type.

Requirements for Deep Detection

There are three requirements for implementing a deep detection module that is capable of identifying one or more unique XML types. These include:

  • An extended type definition for describing the format in more detail (TypeDetection.xcu).
  • A DetectService implementation.
  • A DetectService definition (TypeDetection.xcu).

Extending the File Type Definition

Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the DocType information can be contained as part of the file type definition. This information is held as part of the ClipboardFormat property of the type node. A unique namespace or preface identifies the string at this point in the sequence as being a DocType declaration.

Sample Type definition:

  <node oor:name="writer_DocBook_File" oor:op="replace">
      <prop oor:name="UIName">
          <value XML:lang="en-US">DocBook</value>
      </prop>
      <prop oor:name="Data">
          <value>0,,doctype:-//OASIS//DTD DocBook XML V4.1.2//EN,,XML,20002,</value>
      </prop>
  </node>

The ExtendedTypeDetection Service Implementation

In order for the type detection code to function as an ExtendedTypeDetection service, you must implement the detect() method as defined by the com.sun.star.document.XExtendedFilterDetection interface definition:

  string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );

This method supplies you with a sequence of PropertyValues from which you can use to extract the current TypeName and the URL of the file being loaded:

  ::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException) 
  {
  const PropertyValue * pValue = aArguments.getConstArray();
  sal_Int32 nLength;
  ::rtl::OString resultString;
  nLength = aArguments.getLength();
  for (sal_Int32 i = 0; i < nLength; i++) {
          if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("TypeName"))) {
          }
          else if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("URL"))) {
                  pValue[i].Value >>= sUrl;
          }
  }

Once you have the URL of the file, you can then use it to create a ::ucb::Content from which you can open an XInputStream to the file:

  Reference< com::sun::star::ucb::XCommandEnvironment > xEnv;
  ::ucb::Content aContent(sUrl,xEnv);
  xInStream = aContent.openStream();

You can now use this XInputStream to read the header of the file being loaded. Because the exact location of the DocType information within the file is not known, the first 1000 bytes of information will be read:

  ::rtl::OString resultString;
  com::sun::star::uno::Sequence< sal_Int8 > aData;
  long bytestRead = xInStream->readBytes (aData, 1000);
  resultString = ::rtl::OString( (const sal_Char *)aData.getConstArray(), bytestRead);

Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:

  Reference <XNameAccess> xTypeCont(mxMSF->createInstance(OUString::createFromAscii(
                                  "com.sun.star.document.TypeDetection" )),UNO_QUERY);
  Sequence <::rtl::OUString> myTypes= xTypeCont->getElementNames();
  nLength = myTypes.getLength();

For each of these types, you must first determine whether the ClipboardFormat property contains a DocType:

  Loc_of_ClipboardFormat=...;
  Sequence<::rtl::OUString> ClipboardFormatSeq;
  Type_Props[Loc_of_ClipboardFormat].Value >>=ClipboardFormatSeq ;
  while() {
          if(ClipboardFormatSeq.match(OUString::createFromAscii("doctype:") {
                      //if it contains a DocType, start to compare to header
          }
  }

All of the possible DocType declarations of the file types can be checked to determine a match. If a match is found, the type corresponding to the match is returned. If no match is found, an empty string is returned. This will force OpenOffice.org into flat detection mode.

TypeDetection.xcu DetectServices Entry

Now that you have created the ExtendedTypeDetection service implementation, you need to tell OpenOffice.org when to use this service.

First create a DetectServices node, unless one already exists, and then add the information specific to the detection service that has been implemented, that is, the name of the service and the file types that use it.

  <node oor:name="DetectServices">
  <node oor:name="com.sun.star.comp.filters.XMLDetect" oor:op="replace">
          <prop oor:name="ServiceName">
                  <value XML:lang="en-US">com.sun.star.comp.filters.XMLDetect</value>
          </prop>
          <prop oor:name="Types">
                  <value>writer_DocBook_File</value>
                  <value>writer_Flat_XML_File</value>
          </prop>
  </node>
  </node>
Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages