Difference between revisions of "Calc/Performance/misc"

From Apache OpenOffice Wiki
Jump to: navigation, search
(create)
 
(use SUBPAGENAME in category sort key for reusability)
 
(27 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== Calc Optimisation Opportunities ==
+
Miscellaneous performance optimization opportunities that don't have an own entry under [[Calc/To-Dos/Performance]]/... yet.
  
There are a lot of opportunities in calc for various reasons. This list needs extending:
+
== In-sheet objects ==
  
=== Cell size ===
+
With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample [http://www.openoffice.org/issues/show_bug.cgi?id=41164 document].
  
Basic problem: the most basic cell consumes about 50bytes all told, more complex cells consume far more memory, there are a number of simple & obvious things to re-factor here.
+
The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie.
  
=== ScBaseCell ===
+
== Large / complex pivot sheets ==
  
sc/inc/cell.hxx:
+
The existing Data Pilot implementation doesn't have a shared normalized form of the data. (ie. with each field reduced to an ordinal, for O(1) lookup). We should implement just such a Data Pilot cache using a representation compatible with the PivotTable cache, and populatable from that on import.
  
<pre>
+
== threaded calculation ==
class ScBaseCell
+
{
+
protected:
+
ScPostIt* pNote;
+
SvtBroadcaster* pBroadcaster;
+
USHORT nTextWidth;
+
BYTE eCellType; // enum CellType - BYTE spart Speicher
+
BYTE nScriptType;
+
</pre>
+
  
Every cell carries this overhead; note that a chunk of it is not necessary for many cells:
+
Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph and then thread the calculation.
  
* ScPostIt pointer - very, very infrequently used - we have almost no post-it note per cell.
+
Similarly the process of constructing a Data Pilot cache, and (subsequently) collating that data is one that is susceptible to threading.
* SvtBroadcaster - used by cells that are referenced (by a single cell (ie. non-range) reference) from another cell - again, a sub-set of all cells.
+
  
Solutions: a little re-factoring required, but stealing a bit-field from eCellType to denote a 'special' cell:
 
  
<pre>
 
class ScBaseCell
 
{
 
protected:
 
USHORT nTextWidth;
 
BYTE eCellType : 7;  // enum CellType - BYTE spart Speicher
 
        bool                    bSpecial : 1;  // other information to be looked up elsewhere
 
BYTE nScriptType;
 
</pre>
 
  
The 'bSpecial' flag could be used to denote that there is a 'note' for this cell (in a separate hash), or that this cell has a single-cell dependant. So - we can save 2/3rds of the base size with fairly little effort.
+
[[Category:Calc|Performance/{{SUBPAGENAME}}]]
 +
[[Category:To-Do]]
 +
[[Category:Performance]]

Latest revision as of 15:24, 6 March 2009

Miscellaneous performance optimization opportunities that don't have an own entry under Calc/To-Dos/Performance/... yet.

In-sheet objects

With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample document.

The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie.

Large / complex pivot sheets

The existing Data Pilot implementation doesn't have a shared normalized form of the data. (ie. with each field reduced to an ordinal, for O(1) lookup). We should implement just such a Data Pilot cache using a representation compatible with the PivotTable cache, and populatable from that on import.

threaded calculation

Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph and then thread the calculation.

Similarly the process of constructing a Data Pilot cache, and (subsequently) collating that data is one that is susceptible to threading.

Personal tools