Difference between revisions of "Calc/Performance/Specific Bottlenecks"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Sorting values within functions: solved using ::std::nth_element() where possible)
m (Calc/To-Dos/Performance/Specific Bottlenecks moved to Calc/Performance/Specific Bottlenecks: Make subpage hierarchy static, dynamically assign categories instead.)
(No difference)

Revision as of 15:22, 10 February 2009

Specific bottlenecks to be worked on, identified using tools such as valgrind --tool=callgrind.

The Zaske case

Comparison with Excel 2003/2007 that need 1.2s where Calc needs 24s after changing a cell's value.


Findings: lots of formulas directly or indirectly referring the input cell, with many listening to identical ranges.

Fix: Introduce the now existing bulk broadcaster that was already used for mass changes also for single cell changes to prevent repetitive broadcasts of identical ranges.

Fixed as i95967 in CWS calcperf03. Now Calc does it in 1.2s too..

The Ou case

Loading a large plain data file takes very long.



  • source/filter/xml/xmlsubti.cxx
    • 38% of time spent in ScMyTables::NewColumn() because of replicated use of aTableVec[nTableCount - 1] (vector::operator[])
      Note: percentage may be off due to compilation without optimization to obtain exact line numbers that may result in STLport's vector methods being differently compiled.
      • proposed fix: should obtain the pointer once instead.
    • Similar for other places where aTableVec[xxx] is used.
  • TODO: Check all ScMyTables::.*() and ScMyTableData::.*()
    • Especially for 63342857 calls to AddColumn() and NewColumn() that result in 1168654944 calls to operator[] ...
    • 63081776 calls to AddColumn() originate from ScXMLTableRowCellContext::EndElement()
    • Those are highly suspicious and seem to indicate that too many temporary elements are created for empty columns/cells (needs verification).

Sorting values within functions

i89976 has a document attached:


NOTE: Some assumptions made by the submitter as documented in the test case are plain wrong.

Findings when testing with filling C4:C3003

  • 52% overall in interpr3.cxx lcl_QuickSort() and below, of which
    • 32% in vector<double>::operator[] and below,
      • 25% originating from the loops
       while (ni <= nHi && rSortArray[ni]  < rSortArray[nLo]) ni++;
       while (nj >= nLo && rSortArray[nLo] < rSortArray[nj])  nj--;

where rSortArray[nLo] should be a temporary variable instead.

Or all that be realized using simple double[].

  • 21% overall in ScValueIterator::GetThis() and below.

Solution is in CWS DEV300 calcperf03  , besides the loops above being changed to use a temporary value, which benefits other functions, the use of a fully sorted array for functions MEDIAN, PERCENTILE, QUARTILE, LARGE and SMALL was eliminated by using ::std::nth_element() instead. Filling entire column C as described in the document now needs ~12 minutes instead of an hour or so. Excel needed ~15 minutes, but that can't be exactly compared because it was on different machines.

Querying data within functions

An internal customer's document (sorry, can't publish) doing lookup queries that don't fit into the current caching strategy.


  • 8% in 51613353 calls to com::sun::star::i18n::casefolding::getNextChar() via
    • 39696595 calls to utl::TransliterationWrapper::isEqual() via
      • ScTable::ValidQuery() via
        • 8888 calls to ScQueryCellIterator::GetThis() via
          • lcl_LookupQuery()
  • 5% in ScTableValidQuery() most in String() and ~String() of aCellStr
  • 200873636 calls to com::sun::star::i18n::casefolding::getNextChar() via
    • 33173401 calls to com::sun::star::i18n::Transliteration_caseignore::compare()
  • 5% in com::sun::star::i18n::oneToOneMappingWithFlag::find()
    • Replicated mpIndex[high] access, might be better using temporary pointer.
  • 5% in com::sun::star::i18n::casefolding::getValue()
  • 58% overall in ScTable::ValidQuery() and below
    • TODO: Cache results of ValidQuery()? Similar to ScLookupCache?
  • 11% overall in 27341713 calls to ScBroadcastAreaSlot::StartListeningArea() and below, of which 10% are in ::std::set::insert() and below.
    • TODO: refactor implementation of broadcast slots.

OOX import issue 96758

A document in xlsx format found somewhere on the internet (issue 96758).


  • Takes more than 20 minutes to load in a debug session
  • about 95% of total load time in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (more than 1 second per call)
    • ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
      • ~100% in ::oox::xls::Xf::writeToPropertySet()
        • root cause: multiple XPropertySet accesses while writing font properties

2008-12-10: First step. Reduce run time of ::oox::xls::Xf::writeToPropertySet().

  • Consolidated property set usage to one API call per XF (cell format object). Load time reduced from >20 minutes to 3:45 minutes. Woohoo.
    • Still spends 68% of total load time (2:33 minutes) in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (132 ms per call)

2008-12-12: Second step. Optimize overall property set usage.

  • Changed interfaces of ::oox::PropertyMap and ::oox::PropertySet from property name (string) to property identifiers (integer). Identifiers will be generated on compile time from a text file with all used property names. A process singleton (created on demand) will contain a big vector of property name strings. Saves a few seconds of the total load time.

2008-12-18: Third step. Reduce call count of ::oox::xls::WorksheetData::convertRowFormat() by caching row formats and applying them at row ranges if formatting is equal across the rows.

  • Turns out that the 1160 formatted rows could be merged into 13 ranges. This means that the 1160 API calls have been reduced to 13! Load time reduced from 3:40 minutes to 1:07 minutes. Needed time to format the 1160 rows reduced from 2:33 minutes to 2 seconds!
Personal tools