Difference between revisions of "XML Load"

From Apache OpenOffice Wiki
Jump to: navigation, search
 
 
(6 intermediate revisions by 2 users not shown)
Line 17: Line 17:
 
== Results ==  
 
== Results ==  
 
* FastXML provides good speedup, across the test suite.  
 
* FastXML provides good speedup, across the test suite.  
* Expat is the fastest parser.  
+
* libxml2 (SAX API) is the fastest from a pure parsing point of view.  
 +
** libxml2 (Reader/processNode) is slower, but comparable to expat
 +
** expat is faster than xerces (SAX & SAX2) as well as OO.o.
 +
** OO.o parser has some UNO interface overhead (to be measured)
  
 
== Ongoing Work ==  
 
== Ongoing Work ==  
 
* Performance counter to measure proportion of time spent in:  
 
* Performance counter to measure proportion of time spent in:  
*# Container file uncompress & open
+
*# Opening & uncompressing container files
 
*# XML Parser setup  
 
*# XML Parser setup  
 
*# Actual Parsing  
 
*# Actual Parsing  
 
*# String Allocation  
 
*# String Allocation  
 
*# Building Doc Model  
 
*# Building Doc Model  
*# Rendering  
+
*# Rendering
 +
*# replace expat with [[libxml2]] to compare performance in office, currently not working for all files
 +
== Data ==
 +
The [[Image:Xml-load-compare.ods|spreadsheet]].
  
The code has been partially instrumented - runs not done yet.
+
[[Category:Performance]]
 
+
== Data ==
+

Latest revision as of 11:42, 26 February 2009

ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.

We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.

Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)

Comparisons

  • Compare OpenOffice TestXML and TestFastXML for doc sample
  • Compare different XML parsers & APIs in terms of processing content.xml
  • Compare time spent in XML parsing, building document model, and rendering

Methodology

  • Time is measured in the same way for each test - This is based on Time::GetSystemTicks
  • File handling and parsing is done as similar as possible - within allowances of API differences.
  • Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.

Results

  • FastXML provides good speedup, across the test suite.
  • libxml2 (SAX API) is the fastest from a pure parsing point of view.
    • libxml2 (Reader/processNode) is slower, but comparable to expat
    • expat is faster than xerces (SAX & SAX2) as well as OO.o.
    • OO.o parser has some UNO interface overhead (to be measured)

Ongoing Work

  • Performance counter to measure proportion of time spent in:
    1. Opening & uncompressing container files
    2. XML Parser setup
    3. Actual Parsing
    4. String Allocation
    5. Building Doc Model
    6. Rendering
    7. replace expat with libxml2 to compare performance in office, currently not working for all files

Data

The File:Xml-load-compare.ods.

Personal tools