XML and Filter

From Apache OpenOffice Wiki
Revision as of 19:44, 31 August 2006 by SergeMoutou (Talk | contribs)

Jump to: navigation, search

We want in this chapter examine how XML and filters work.

Parsing a XML File with SAX

To start, have a look at :

Using UNO's Xml sax parser via the API

It's a Danny Brewer's OooBasic program which use SAX. To avoid searching I begin to give the code here.

Sax and OooBasic (Danny Brewer)

This example demonstrates several things.

  • Using the UCB's SimpleFileAccess to read from a file. (UCB = Universal Content Broker)
  • Using OOo's XML sax parser.
  • Creating a listener with Basic's CreateUnoListener() function.

If you have never used a Sax type of Xml parser before, then this example may not be for you.

OOo has a Sax Xml parser available via. the Uno api. The following program, in Basic, shows how to use it to parse an Xml document. As the document is parsed, events are fired which print little annoying dialog boxes on the screen. (Be sure to parse a VERY SMALL xml document so that you only have to click OK about a dozen or so times!) [oobas]

REM Listing 1 Using SAX with OooBasic REM ***** BASIC ***** REM **** Danny Brewer (Mon Jan 12, 2004) **** Sub Main

 cXmlFile = "C:\TestData.xml"  
 cXmlUrl = ConvertToURL( cXmlFile ) 
 ReadXmlFromUrl( cXmlUrl ) 

End Sub

' This routine demonstrates how to use the Universal Content Broker's ' SimpleFileAccess to read from a local file. Sub ReadXmlFromUrl( cUrl ) ' The SimpleFileAccess service provides mechanisms to open, read, write files, ' as well as scan the directories of folders to see what they contain. ' The advantage of this over Basic's ugly file manipulation is that this ' technique works the same way in any programming language. ' Furthermore, the program could be running on one machine, while the SimpleFileAccess ' accesses files from the point of view of the machine running OOo, not the machine ' where, say a remote Java or Python program is running.

 oSimpleFileAccess = createUnoService( "com.sun.star.ucb.SimpleFileAccess" )  

' Open input file.

 oInputStream = oSimpleFileAccess.openFileRead( cUrl ) 

ReadXmlFromInputStream( oInputStream )

 oInputStream.closeInput() 

End Sub

Sub ReadXmlFromInputStream( oInputStream ) ' Create a Sax Xml parser.

 oSaxParser = createUnoService( "com.sun.star.xml.sax.Parser" ) 

' Create a document event handler object. ' As methods of this object are called, Basic arranges ' for global routines (see below) to be called.

 oDocEventsHandler = CreateDocumentHandler() 

' Plug our event handler into the parser. ' As the parser reads an Xml document, it calls methods ' of the object, and hence global subroutines below ' to notify them of what it is seeing within the Xml document.

 oSaxParser.setDocumentHandler( oDocEventsHandler ) 

' Create an InputSource structure.

 oInputSource = createUnoStruct( "com.sun.star.xml.sax.InputSource" ) 
 With oInputSource 
   .aInputStream = oInputStream ' plug in the input stream 
 End With 

' Now parse the document. ' This reads in the entire document. ' Methods of the oDocEventsHandler object are called as ' the document is scanned.

 oSaxParser.parseStream( oInputSource ) 

End Sub

'================================================== ' Xml Sax document handler. '================================================== ' Global variables used by our document handler. ' ' Once the Sax parser has given us a document locator, ' the glLocatorSet variable is set to True, ' and the goLocator contains the locator object. ' ' The methods of the locator object has cool methods ' which can tell you where within the current Xml document ' being parsed that the current Sax event occured. ' The locator object implements com.sun.star.xml.sax.XLocator. ' Private goLocator As Object Private glLocatorSet As Boolean

' This creates an object which implements the interface ' com.sun.star.xml.sax.XDocumentHandler. ' The doucment handler is returned as the function result. Function CreateDocumentHandler() ' Use the CreateUnoListener function of Basic. ' Basic creates and returns an object that implements a particular interface. ' When methods of that object are called, ' Basic will call global Basic functions whose names are the same ' as the methods, but prefixed with a certian prefix.

 oDocHandler = CreateUnoListener( "DocHandler_", "com.sun.star.xml.sax.XDocumentHandler" ) 
 glLocatorSet = False 
 CreateDocumentHandler() = oDocHandler 

End Function

'================================================== ' Methods of our document handler call these ' global functions. ' These methods look strangely similar to ' a SAX event handler. ;-) ' These global routines are called by the Sax parser ' as it reads in an XML document. ' These subroutines must be named with a prefix that is ' followed by the event name of the com.sun.star.xml.sax.XDocumentHandler interface. '================================================== Sub DocHandler_startDocument()

 Print "Start document" 

End Sub

Sub DocHandler_endDocument() ' Print "End document" End Sub

Sub DocHandler_startElement( cName As String, oAttributes As com.sun.star.xml.sax.XAttributeList )

 Print "Start element", cName 

End Sub

Sub DocHandler_endElement( cName As String ) ' Print "End element", cName End Sub

Sub DocHandler_characters( cChars As String ) End Sub

Sub DocHandler_ignorableWhitespace( cWhitespace As String ) End Sub

Sub DocHandler_processingInstruction( cTarget As String, cData As String ) End Sub

Sub DocHandler_setDocumentLocator( oLocator As com.sun.star.xml.sax.XLocator ) ' Save the locator object in a global variable. ' The locator object has valuable methods that we can ' call to determine

 goLocator = oLocator 
 glLocatorSet = True 

End Sub

This code install an eventListener with a com.sun.star.xml.sax.XDocumentHandler interface.The corresponding IDL documentation of this interface is :

//Listing 2 IDL XdocumentHandler Interface // IDL module com { module sun { module star { module xml { module sax { interface XDocumentHandler: com::sun::star::uno::XInterface { void startDocument() raises( com::sun::star::xml::sax::SAXException ); void endDocument() raises( com::sun::star::xml::sax::SAXException ); void startElement( [in] string aName, [in] com::sun::star::xml::sax::XAttributeList xAttribs ) raises( com::sun::star::xml::sax::SAXException ); void endElement( [in] string aName ) raises( com::sun::star::xml::sax::SAXException ); void characters( [in] string aChars ) raises( com::sun::star::xml::sax::SAXException ); void ignorableWhitespace( [in] string aWhitespaces ) raises( com::sun::star::xml::sax::SAXException ); void processingInstruction( [in] string aTarget, [in] string aData ) raises( com::sun::star::xml::sax::SAXException ); void setDocumentLocator( [in] com::sun::star::xml::sax::XLocator xLocator ) raises( com::sun::star::xml::sax::SAXException ); }; }; }; }; }; }; You can find the complete implementation of this interface in the OooBasic code of Listing 1. We have then to write an Event Listener in C++. If you want to remember how Event Listener works in C++ have a look here

The C++ Event Listener

We have first to create a C++ class : [cpp] //Listing 3 Class Definition (could be in hxx file) // C++ class XFlatXml : public ::cppu::WeakImplHelper1< ::com::sun::star::xml::sax::XDocumentHandler> { private:

Reference< XMultiServiceFactory > xMSF; 

public:

XFlatXml( const Reference< XMultiServiceFactory > &r ) : xMSF( r ) 
{} 

// Reference < com::sun::star::io::XOutputStream > xOutputStream;

 virtual void SAL_CALL startDocument() throw (SAXException,RuntimeException) ; 
 virtual void SAL_CALL endDocument() throw (SAXException,RuntimeException); 
 virtual void SAL_CALL startElement(const OUString& str, const Reference<XAttributeList>& attriblist) throw (SAXException,RuntimeException); 
 virtual void SAL_CALL endElement(const OUString& str) throw (SAXException,RuntimeException); 
 virtual void SAL_CALL characters(const OUString& str) throw (SAXException,RuntimeException); 
 virtual void SAL_CALL ignorableWhitespace(const OUString& str) throw (SAXException,RuntimeException); 
 virtual void SAL_CALL processingInstruction(const OUString& str, const OUString& str2) throw (SAXException,RuntimeException) ; 
 virtual void SAL_CALL setDocumentLocator(const Reference<XLocator>& doclocator) throw (SAXException,RuntimeException) ; 

}; A C++ correponding implementation code could be [cpp] //Listing 4 The Event Listener Class Implementation // C++ void XFlatXml::startDocument() throw (SAXException,RuntimeException){

 printf("StartDocument\n"); 

}

void XFlatXml::endDocument() throw (SAXException,RuntimeException){

 printf("EndDocument\n"); 

}

void XFlatXml::startElement(const OUString& str, const Reference<XAttributeList>& attriblist) throw (SAXException,RuntimeException){

 Ostring OStr = OUStringToOString ( str,RTL_TEXTENCODING_UTF8); 
 cout<< "StartElement : <" << OStr << " "; 
 for (short i=0;i<attriblist->getLength();i++){ 
   OStr = OUStringToOString ( attriblist->getNameByIndex(i),RTL_TEXTENCODING_UTF8); 
   cout << OStr <<"="; 
   OStr = OUStringToOString ( attriblist->getValueByIndex(i),RTL_TEXTENCODING_UTF8); 
   cout << OStr; 
 } 
 cout << ">" << endl; 

}

void XFlatXml::endElement(const OUString& str) throw (SAXException,RuntimeException) {

 Ostring OStr = OUStringToOString ( str,RTL_TEXTENCODING_UTF8); 
 cout<< "EndElement : </" << OStr << ">" << endl;  

}

void XFlatXml::characters(const OUString& str) throw (SAXException,RuntimeException) {

 OString OStr = OUStringToOString ( str,RTL_TEXTENCODING_UTF8); 
 cout<< "Characers : " << OStr << endl; 

}

void XFlatXml::ignorableWhitespace(const OUString& str) throw (SAXException,RuntimeException){

 printf("ignorableWhitespace\n"); 

}

void XFlatXml::processingInstruction(const OUString& str, const OUString& str2) throw (SAXException,RuntimeException) {

 printf("processingInstruction\n"); 

}

void XFlatXml::setDocumentLocator(const Reference<XLocator>& doclocator) throw (SAXException,RuntimeException) {

 printf("setDocumentLocator\n"); 

} Now it's time to make this event listener working.

Main Program

[cpp] //Listing 5 The Main Program // C++ int main( ) { //retrieve an instance of the remote service manager

 Reference< XMultiServiceFactory > rOfficeServiceManager; 
 rOfficeServiceManager = ooConnect(); 
 OSL_ENSURE(rOfficeServiceManager.is(), "Unable to connected to the office\n"); 

// Installing our new XDocumentHandler

 XFlatXml *xListener = new XFlatXml(rOfficeServiceManager); 
 Reference< XDocumentHandler > xHandler = static_cast< XDocumentHandler* > ( xListener ); 

// getting oSimpleFileAcess

 // com.sun.star.ucb.XSimpleFileAccess \ added in makefile 
 // #include <com/sun/star/ucb/XSimpleFileAccess.hpp> added in this file 
 // using namespace com::sun::star::ucb; added in this file 
 Reference< XSimpleFileAccess > xSFI( rOfficeServiceManager->createInstance 
      (OUString::createFromAscii("com.sun.star.ucb.SimpleFileAccess")),UNO_QUERY); 
 OSL_ENSURE(xSFI.is(), "Unable to get SimpleFileAcessService\n"); 
 // Don't forget #include <osl/file.hxx>
 OUStr sUrl;
 osl::FileBase::getFileURLFromSystemPath(
                OUString::createFromAscii("/home/smoutou/TestData.xml"),sUrl);

// getting oInputStream

 // using namespace com::sun::star::io; added in this file 
 Reference <XInputStream > oInputStream=xSFI->openFileRead(sUrl); 

// oSaxParser = createUnoService( "com.sun.star.xml.sax.Parser" )

 // com.sun.star.xml.sax.XParser \ added in Makefile 
 //#include <com/sun/star/xml/sax/XParser.hpp> added in this file 
 Reference < XParser > oSaxParser( rOfficeServiceManager->createInstance 
        ( OUString::createFromAscii( "com.sun.star.xml.sax.Parser" ) ), UNO_QUERY ); 
 OSL_ENSURE(oSaxParser.is(), "Unable to get Sax Parser\n"); 
 oSaxParser->setDocumentHandler( xHandler ); 
// com.sun.star.xml.sax.InputSource \ added in Makefile 
// #include <com/sun/star/xml/sax/InputSource.hpp> added in this file 
 struct InputSource oInputSource; 
 oInputSource.aInputStream = oInputStream; 
 oSaxParser->parseStream(oInputSource); 
 oInputStream->closeInput(); 
 return 0; 

} To test how the program works we have to provide a XML file. We can take Danny's example :

Listing 6 XML file for test [xml] <Employees>

 <Employee id="101"> 
   <Name> 
     <First>John</First> 
     <Last>Smith</Last> 
   </Name> 
   <Address> 
     <Street>123 Main</Street> 
     <City>Lawrence</City> 
     <State>KS</State> 
     <Zip>66049</Zip> 
   </Address> 
   <Phone type="Home">785-555-1234</Phone> 
 </Employee> 

<Employee id="102">

 <Name> 
   <First>Bob</First> 
   <Last>Jones</Last> 
 </Name> 
 <Address> 
   <Street>456 Puke Drive</Street> 
   <City>Lawrence</City> 
   <State>KS</State> 
   <Zip>66049</Zip> 
 </Address> 
 <Phone type="Home">785-555-1235</Phone> 
 </Employee> 

</Employees> This file prints out in a shell : Listing 7 Result [nowiki] setDocumentLocator StartDocument StartElement : <Employees > Characers :

Characers : StartElement : <Employee id=101> Characers :

Characers : StartElement : <Name > Characers :

Characers : StartElement : <First > Characers : John EndElement : </First> Characers :

Characers : StartElement : <Last > Characers : Smith EndElement : </Last> Characers :

Characers : .....

Personal tools