XML Load

From Apache OpenOffice Wiki
Revision as of 03:43, 16 February 2006 by Dkeskar (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.

We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.

Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)

Comparisons

  • Compare OpenOffice TestXML and TestFastXML for doc sample
  • Compare different XML parsers & APIs in terms of processing content.xml
  • Compare time spent in XML parsing, building document model, and rendering

Methodology

  • Time is measured in the same way for each test - This is based on Time::GetSystemTicks
  • File handling and parsing is done as similar as possible - within allowances of API differences.
  • Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.

Results

  • FastXML provides good speedup, across the test suite.
  • Expat is the fastest parser.

Ongoing Work

  • Performance counter to measure proportion of time spent in:
    1. Container file uncompress & open
    2. XML Parser setup
    3. Actual Parsing
    4. String Allocation
    5. Building Doc Model
    6. Rendering

The code has been partially instrumented - runs not done yet.

Data

Personal tools