Difference between revisions of "XML Load"

From Apache OpenOffice Wiki
Jump to: navigation, search
 
(Ongoing Work)
Line 28: Line 28:
 
*# Rendering  
 
*# Rendering  
  
The code has been partially instrumented - runs not done yet.  
+
* Isolate the SAX/SAX2 component in OpenOffice and repeat XML parse-only tests
 +
 
 +
The code has been partially instrumented - runs not done yet.
  
 
== Data ==
 
== Data ==

Revision as of 03:46, 16 February 2006

ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.

We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.

Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)

Comparisons

  • Compare OpenOffice TestXML and TestFastXML for doc sample
  • Compare different XML parsers & APIs in terms of processing content.xml
  • Compare time spent in XML parsing, building document model, and rendering

Methodology

  • Time is measured in the same way for each test - This is based on Time::GetSystemTicks
  • File handling and parsing is done as similar as possible - within allowances of API differences.
  • Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.

Results

  • FastXML provides good speedup, across the test suite.
  • Expat is the fastest parser.

Ongoing Work

  • Performance counter to measure proportion of time spent in:
    1. Container file uncompress & open
    2. XML Parser setup
    3. Actual Parsing
    4. String Allocation
    5. Building Doc Model
    6. Rendering
  • Isolate the SAX/SAX2 component in OpenOffice and repeat XML parse-only tests

The code has been partially instrumented - runs not done yet.

Data

Personal tools