Import Performance of XLSX

From Apache OpenOffice Wiki
Jump to: navigation, search


OOX import issue 96758

A document in xlsx format found somewhere on the internet (issue 96758).

Findings:

  • Takes more than 20 minutes to load in a debug session
  • about 95% of total load time in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (more than 1 second per call)
    • ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
      • ~100% in ::oox::xls::Xf::writeToPropertySet()
        • root cause: multiple XPropertySet accesses while writing font properties

2008-12-10: First step. Reduce run time of ::oox::xls::Xf::writeToPropertySet().

  • Consolidated property set usage to one API call per XF (cell format object). Load time reduced from >20 minutes to 3:45 minutes. Woohoo.
    • Still spends 68% of total load time (2:33 minutes) in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (132 ms per call)

2008-12-12: Second step. Optimize overall property set usage.

  • Changed interfaces of ::oox::PropertyMap and ::oox::PropertySet from property name (string) to property identifiers (integer). Identifiers will be generated on compile time from a text file with all used property names. A process singleton (created on demand) will contain a big vector of property name strings. Saves a few seconds of the total load time.

2008-12-18: Third step. Reduce call count of ::oox::xls::WorksheetData::convertRowFormat() by caching row formats and applying them at row ranges if formatting is equal across the rows.

  • Turns out that the 1160 formatted rows could be merged into 13 ranges. This means that the 1160 API calls have been reduced to 13! Load time reduced from 3:40 minutes to 1:07 minutes. Needed time to format the 1160 rows reduced from 2:33 minutes to 2 seconds!
Personal tools