XML Load
From Apache OpenOffice Wiki
ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.
We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.
Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)
Comparisons
- Compare OpenOffice TestXML and TestFastXML for doc sample
- Compare different XML parsers & APIs in terms of processing content.xml
- Compare time spent in XML parsing, building document model, and rendering
Methodology
- Time is measured in the same way for each test - This is based on Time::GetSystemTicks
- File handling and parsing is done as similar as possible - within allowances of API differences.
- Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.
Results
- FastXML provides good speedup, across the test suite.
- Expat is the fastest parser.
Ongoing Work
- Performance counter to measure proportion of time spent in:
- Container file uncompress & open
- XML Parser setup
- Actual Parsing
- String Allocation
- Building Doc Model
- Rendering
The code has been partially instrumented - runs not done yet.