Difference between revisions of "Calc/Performance/Specific Bottlenecks"

From Apache OpenOffice Wiki
Jump to: navigation, search
(OOX import issue 96758)
(other)
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Specific bottlenecks''' to be worked on, identified using tools such as
 
<code>[[Valgrind| valgrind --tool=callgrind]]</code>.
 
 
 
 
== The Zaske case ==
 
== The Zaske case ==
  
Comparison with Excel 2003/2007 that need 1.2s where Calc needs 24s after changing a cell's value.
+
Done. Content relocated to [[Calc/Performance/The_Zaske_case]], section preserved for external references linking here.
 
+
References:
+
* [http://zaske.wordpress.com/2006/06/05/excel-calculation-performance/ Zaske's blog entry]
+
* [http://home.comcast.net/~stzaske/PerfTest.zip The test case file] (.zip)
+
* [http://www.youtube.com/watch?v=fA7RjnlitfM Video on YouTube] (same as on the blog)
+
 
+
Findings: lots of formulas directly or indirectly referring the input cell,
+
with many listening to identical ranges.
+
 
+
Fix: Introduce the now existing bulk broadcaster that was already used for mass
+
changes also for single cell changes to prevent repetitive broadcasts of
+
identical ranges.
+
 
+
Fixed as [http://qa.openoffice.org/issues/show_bug.cgi?id=95967 i95967] in
+
[http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fcalcperf03 CWS calcperf03].
+
Now Calc does it in 1.2s too..
+
 
+
 
+
== The Ou case ==
+
 
+
Loading a large plain data file takes very long.
+
 
+
References:
+
* [http://blogs.zdnet.com/Ou/?p=120 George Ou's blog entry]
+
* [http://www.lanarchitect.net/Examples/200264-l.sxc The test case file] (.sxc)
+
* [http://www.lanarchitect.net/Examples/200264-l.zip Same data, but zip'ed Excel-XML]
+
 
+
Findings:
+
 
+
* source/filter/xml/xmlsubti.cxx
+
** 38% of time spent in ScMyTables::NewColumn() because of replicated use of aTableVec[nTableCount - 1]  (vector::operator[]) <br> Note: percentage may be off due to compilation without optimization to obtain exact line numbers that may result in STLport's vector methods being differently compiled.
+
*** proposed fix: should obtain the pointer once instead.
+
** Similar for other places where aTableVec[xxx] is used.
+
 
+
* '''TODO:''' Check all ScMyTables::.*() and ScMyTableData::.*()
+
** Especially for 63342857 calls to AddColumn() and NewColumn() that result in 1168654944 calls to operator[] ...
+
** 63081776 calls to AddColumn() originate from ScXMLTableRowCellContext::EndElement()
+
** Those are highly suspicious and seem to indicate that too many temporary elements are created for empty columns/cells (needs verification).
+
 
+
  
 
== Sorting values within functions ==
 
== Sorting values within functions ==
  
[http://www.openoffice.org/issues/show_bug.cgi?id=89976 i89976] has a document attached:
+
Done. Content relocated to [[Calc/Performance/sorting_values_within_functions]], section preserved for external references linking here.
 
+
test-huge_calculations-Median-detailed.ods
+
 
+
'''NOTE:''' The assumptions made by the submitter as documented in the test case are plain wrong.
+
 
+
Findings when testing with filling C4:C3003
+
 
+
* 52% overall in interpr3.cxx lcl_QuickSort() and below, of which
+
** 32% in vector<double>::operator[] and below,
+
*** 25% originating from the loops
+
 
+
        while (ni <= nHi && rSortArray[ni]  < rSortArray[nLo]) ni++;
+
        while (nj >= nLo && rSortArray[nLo] < rSortArray[nj])  nj--;
+
 
+
where rSortArray[nLo] should be a temporary variable instead.
+
 
+
Or all that be realized using simple double[].
+
 
+
* 21% overall in ScValueIterator::GetThis() and below.
+
 
+
 
+
== Querying data within functions ==
+
 
+
An internal customer's document (sorry, can't publish) doing lookup queries
+
that don't fit into the current caching strategy.
+
 
+
Findings:
+
 
+
* 8% in 51613353 calls to com::sun::star::i18n::casefolding::getNextChar() via
+
** 39696595 calls to utl::TransliterationWrapper::isEqual() via
+
*** ScTable::ValidQuery() via
+
**** 8888 calls to ScQueryCellIterator::GetThis() via
+
***** lcl_LookupQuery()
+
 
+
* 5% in ScTableValidQuery() most in String() and ~String() of aCellStr
+
 
+
* 200873636 calls to com::sun::star::i18n::casefolding::getNextChar() via
+
** 33173401 calls to com::sun::star::i18n::Transliteration_caseignore::compare()
+
 
+
* 5% in com::sun::star::i18n::oneToOneMappingWithFlag::find()
+
** Replicated mpIndex[high] access, might be better using temporary pointer.
+
 
+
* 5% in com::sun::star::i18n::casefolding::getValue()
+
 
+
* 58% overall in ScTable::ValidQuery() and below
+
** '''TODO:''' Cache results of ValidQuery()? Similar to ScLookupCache?
+
 
+
* 11% overall in 27341713 calls to ScBroadcastAreaSlot::StartListeningArea() and below, of which 10% are in ::std::set::insert() and below.
+
** '''TODO:''' refactor implementation of broadcast slots.
+
 
+
 
+
[[Category:Calc|Performance/Specific_Bottlenecks]]
+
[[Category:To-Do]]
+
[[Category:Performance]]
+
 
+
== OOX import issue 96758 ==
+
 
+
A document in xlsx format found somewhere on the internet ([http://qa.openoffice.org/issues/show_bug.cgi?id=96758 issue 96758]).
+
 
+
Findings:
+
 
+
* Takes more than 20 minutes to load in a debug session
+
  
* about 95% of total load time in ::oox::xls::WorksheetData::convertRowFormat()
+
== other ==
** ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
+
*** ~100% in ::oox::xls::Xf::writeToPropertySet()
+
**** multiple XPropertySet accesses while writing font properties
+
  
2008-12-10: First step. Consolidate property set usage to one API call per XF (cell format object). Load time reduced from 20 minutes to 4 minutes. Woohoo.
+
For other performance optimization tasks previously located here please see the individual pages listed under [[Calc/Performance]].
 +
 
  
2008-12-12: Second step. Change interface of ::oox::PropertyMap and ::oox::PropertySet from property name strings to integer property identifiers. Identifiers will be generated on compile time from a text file with all used property names. A process singleton (created on demand) will contain a big vector of property name strings. Saves a few seconds of the 4 minutes.
+
[[Category:Calc]]

Latest revision as of 06:38, 4 December 2009

The Zaske case

Done. Content relocated to Calc/Performance/The_Zaske_case, section preserved for external references linking here.

Sorting values within functions

Done. Content relocated to Calc/Performance/sorting_values_within_functions, section preserved for external references linking here.

other

For other performance optimization tasks previously located here please see the individual pages listed under Calc/Performance.

Personal tools