12 April 2014: The OpenOffice Wiki is not, and never was, affected by the heartbleed bug. Users' passwords are safe and wiki users do not need take any actions.

Import Performance of XLSX

From Apache OpenOffice Wiki
Jump to: navigation, search


OOX import issue 96758

A document in xlsx format found somewhere on the internet (issue 96758).

Findings:

  • Takes more than 20 minutes to load in a debug session
  • about 95% of total load time in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (more than 1 second per call)
    • ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
      • ~100% in ::oox::xls::Xf::writeToPropertySet()
        • root cause: multiple XPropertySet accesses while writing font properties

2008-12-10: First step. Reduce run time of ::oox::xls::Xf::writeToPropertySet().

  • Consolidated property set usage to one API call per XF (cell format object). Load time reduced from >20 minutes to 3:45 minutes. Woohoo.
    • Still spends 68% of total load time (2:33 minutes) in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (132 ms per call)

2008-12-12: Second step. Optimize overall property set usage.

  • Changed interfaces of ::oox::PropertyMap and ::oox::PropertySet from property name (string) to property identifiers (integer). Identifiers will be generated on compile time from a text file with all used property names. A process singleton (created on demand) will contain a big vector of property name strings. Saves a few seconds of the total load time.

2008-12-18: Third step. Reduce call count of ::oox::xls::WorksheetData::convertRowFormat() by caching row formats and applying them at row ranges if formatting is equal across the rows.

  • Turns out that the 1160 formatted rows could be merged into 13 ranges. This means that the 1160 API calls have been reduced to 13! Load time reduced from 3:40 minutes to 1:07 minutes. Needed time to format the 1160 rows reduced from 2:33 minutes to 2 seconds!
Personal tools