XML Load

From Apache OpenOffice Wiki
Jump to: navigation, search

ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.

We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.

Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)


  • Compare OpenOffice TestXML and TestFastXML for doc sample
  • Compare different XML parsers & APIs in terms of processing content.xml
  • Compare time spent in XML parsing, building document model, and rendering


  • Time is measured in the same way for each test - This is based on Time::GetSystemTicks
  • File handling and parsing is done as similar as possible - within allowances of API differences.
  • Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.


  • FastXML provides good speedup, across the test suite.
  • libxml2 (SAX API) is the fastest from a pure parsing point of view.
    • libxml2 (Reader/processNode) is slower, but comparable to expat
    • expat is faster than xerces (SAX & SAX2) as well as OO.o.
    • OO.o parser has some UNO interface overhead (to be measured)

Ongoing Work

  • Performance counter to measure proportion of time spent in:
    1. Opening & uncompressing container files
    2. XML Parser setup
    3. Actual Parsing
    4. String Allocation
    5. Building Doc Model
    6. Rendering
    7. replace expat with libxml2 to compare performance in office, currently not working for all files


The File:Xml-load-compare.ods.

Personal tools