Difference between revisions of "Documentation/DevGuide/OfficeDev/XML Filter Detection"
OOoWikiBot (Talk | contribs) m (FINAL VERSION FOR L10N) |
|||
Line 7: | Line 7: | ||
|NextPage=Documentation/DevGuide/OfficeDev/Number Formats | |NextPage=Documentation/DevGuide/OfficeDev/Number Formats | ||
}} | }} | ||
− | {{DISPLAYTITLE:XML Filter Detection}} | + | {{Documentation/DevGuideLanguages|Documentation/DevGuide/OfficeDev/{{SUBPAGENAME}}}} |
+ | {{DISPLAYTITLE:XML Filter Detection}} | ||
__NOTOC__ | __NOTOC__ | ||
<!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>--> | <!--<idltopic>com.sun.star.document.ExtendedTypeDetection</idltopic>--> |
Revision as of 11:21, 13 May 2009
The number of XML files that conform to differing DTD specifications means that a single filter and file type definition is insufficient to handle all of the possible formats available. In order to allow OpenOffice.org to handle multiple filter definitions and implementations, it is necessary to implement an additional filter detection module that is capable of determining the type of XML file being read, based on its DocType
declaration.
To accomplish this, a filter detection service com.sun.star.document.ExtendedTypeDetection can be implemented, which is capable of handling and distinguishing between many different XML based file formats. This type of service supersedes the basic flat detection, which uses the file's suffix to determine the Type, and instead, carries out a deep detection which uses the file's internal structure and content to detect its true type.
Requirements for Deep Detection
There are three requirements for implementing a deep detection module that is capable of identifying one or more unique XML types. These include:
- An extended type definition for describing the format in more detail (TypeDetection.xcu).
- A
DetectService
implementation. - A
DetectService
definition (TypeDetection.xcu).
Extending the File Type Definition
Since many different XML files can conform to different DTDs, the type definition of a particular XML file needs to be extended. To do this, some or all of the DocType
information can be contained as part of the file type definition. This information is held as part of the ClipboardFormat
property of the type node. A unique namespace or preface identifies the String at this point in the sequence as being a DocType
declaration.
Sample Type definition:
<node oor:name="writer_DocBook_File" oor:op="replace"> <prop oor:name="UIName"> <value XML:lang="en-US">DocBook</value> </prop> <prop oor:name="Data"> <value> 0, , doctype:-//OASIS//DTD DocBook XML V4.1.2//EN, , XML, 20002, </value> </prop> </node>
The ExtendedTypeDetection Service Implementation
In order for the type detection code to function as an ExtendedTypeDetection
service, you must implement the detect()
method as defined by the com.sun.star.document.XExtendedFilterDetection interface definition:
string detect( [inout]sequence<com::sun::star::beans::PropertyValue > Descriptor );
This method supplies you with a sequence of ProptertyValues
from which you can use to extract the current TypeName
and the URL
of the file being loaded:
::rtl::OUString SAL_CALL FilterDetect::detect(com::sun::star::uno::Sequence< com::sun::star::beans::PropertyValue >& aArguments ) throw (com::sun::star::uno::RuntimeException) { const PropertyValue * pValue = aArguments.getConstArray(); sal_Int32 nLength; ::rtl::OString resultString; nLength = aArguments.getLength(); for (sal_Int32 i = 0; i < nLength; i++) { if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("TypeName"))) { } else if (pValue[i].Name.equalsAsciiL(RTL_CONSTASCII_STRINGPARAM("URL"))) { pValue[i].Value >>= sUrl; } }
Once you have the URL of the file, you can then use it to create a ::ucb::Content
from which you can open an XInputStream
to the file:
Reference< com::sun::star::ucb::XCommandEnvironment > xEnv; ::ucb::Content aContent(sUrl,xEnv); xInStream = aContent.openStream();
You can now use this XInputStream
to read the header of the file being loaded. Because the exact location of the DocType
information within the file is not known, the first 1000 bytes of information will be read:
::rtl::OString resultString; com::sun::star::uno::Sequence< sal_Int8 > aData; long bytestRead =xInStream->readBytes (aData, 1000); resultString=::rtl::OString( (const sal_Char *)aData.getConstArray(),bytestRead) ;
Once you have this information, you can start looking for a type that describes the file being loaded. In order to do this, you need to get a list of the types currently supported:
Reference <XNameAccess> xTypeCont(mxMSF->createInstance(OUString::createFromAscii( "com.sun.star.document.TypeDetection" )),UNO_QUERY); Sequence <::rtl::OUString> myTypes= xTypeCont->getElementNames(); nLength = myTypes.getLength();
For each of these types, you must first determine whether the ClipboardFormat
property contains a DocType:
Loc_of_ClipboardFormat=...; Sequence<::rtl::OUString> ClipboardFormatSeq; Type_Props[Loc_of_ClipboardFormat].Value >>=ClipboardFormatSeq ; while() { if(ClipboardFormatSeq.match(OUString::createFromAscii("doctype:") { //if it contains a DocType, start to compare to header } }
All of the possible DocType
declarations of the file types can be checked to determine a match. If a match is found, the type corresponding to the match is returned. If no match is found, an empty string is returned. This will force OpenOffice.org into flat detection mode.
TypeDetection.xcu DetectServices Entry
Now that you have created the ExtendedTypeDetection
service implementation, you need to tell OpenOffice.org when to use this service.
First create a DetectServices
node, unless one already exists, and then add the information specific to the detection servicethat has been implemented, that is, the name of the service and the file types that use it.
<node oor:name="DetectServices"> <node oor:name="com.sun.star.comp.filters.XMLDetect" oor:op="replace"> <prop oor:name="ServiceName"> <value XML:lang="en-US">com.sun.star.comp.filters.XMLDetect</value> </prop> <prop oor:name="Types"> <value>writer_DocBook_File</value> <value>writer_Flat_XML_File</value> </prop> </node> </node>
Content on this page is licensed under the Public Documentation License (PDL). |