Difference between revisions of "XML Load"
From Apache OpenOffice Wiki
(→Ongoing Work) |
|||
(5 intermediate revisions by 2 users not shown) | |||
Line 17: | Line 17: | ||
== Results == | == Results == | ||
* FastXML provides good speedup, across the test suite. | * FastXML provides good speedup, across the test suite. | ||
− | * | + | * libxml2 (SAX API) is the fastest from a pure parsing point of view. |
+ | ** libxml2 (Reader/processNode) is slower, but comparable to expat | ||
+ | ** expat is faster than xerces (SAX & SAX2) as well as OO.o. | ||
+ | ** OO.o parser has some UNO interface overhead (to be measured) | ||
== Ongoing Work == | == Ongoing Work == | ||
* Performance counter to measure proportion of time spent in: | * Performance counter to measure proportion of time spent in: | ||
− | *# | + | *# Opening & uncompressing container files |
*# XML Parser setup | *# XML Parser setup | ||
*# Actual Parsing | *# Actual Parsing | ||
*# String Allocation | *# String Allocation | ||
*# Building Doc Model | *# Building Doc Model | ||
− | *# Rendering | + | *# Rendering |
+ | *# replace expat with [[libxml2]] to compare performance in office, currently not working for all files | ||
+ | == Data == | ||
+ | The [[Image:Xml-load-compare.ods|spreadsheet]]. | ||
− | + | [[Category:Performance]] | |
− | + | ||
− | + | ||
− | + | ||
− | + |
Latest revision as of 11:42, 26 February 2009
ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.
We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.
Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)
Comparisons
- Compare OpenOffice TestXML and TestFastXML for doc sample
- Compare different XML parsers & APIs in terms of processing content.xml
- Compare time spent in XML parsing, building document model, and rendering
Methodology
- Time is measured in the same way for each test - This is based on Time::GetSystemTicks
- File handling and parsing is done as similar as possible - within allowances of API differences.
- Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.
Results
- FastXML provides good speedup, across the test suite.
- libxml2 (SAX API) is the fastest from a pure parsing point of view.
- libxml2 (Reader/processNode) is slower, but comparable to expat
- expat is faster than xerces (SAX & SAX2) as well as OO.o.
- OO.o parser has some UNO interface overhead (to be measured)
Ongoing Work
- Performance counter to measure proportion of time spent in:
- Opening & uncompressing container files
- XML Parser setup
- Actual Parsing
- String Allocation
- Building Doc Model
- Rendering
- replace expat with libxml2 to compare performance in office, currently not working for all files