Calc/Performance/misc

Calc Optimization Opportunities

There are a lot of opportunities in calc for various reasons. This list needs extending:

Cell size

Basic problem: the most basic cell consumes about 50bytes all told, more complex cells consume far more memory, there are a number of simple & obvious things to re-factor here.

It's trivial to calculate the average cost - simply create a sheet with several thousand cells in it, and measure the heap allocation change on load - with memprof or some other tool, then divide by the number of cells. Similarly, wins are easy to measure this way.

ScBaseCell

sc/inc/cell.hxx; code in sc/source/core/data/cell.cxx & cell2.cxx

class ScBaseCell
{
protected:
	ScPostIt*		pNote;
	SvtBroadcaster*	pBroadcaster;
	USHORT			nTextWidth;
	BYTE			eCellType;		// enum CellType - BYTE spart Speicher
	BYTE			nScriptType;

Every cell carries this overhead; note that a chunk of it is not necessary for many cells:

ScPostIt pointer - very, very infrequently used - we have almost no post-it note per cell.
SvtBroadcaster - used by cells that are referenced (by a single cell (ie. non-range) reference) from another cell - again, a sub-set of all cells.

Solutions: a little re-factoring required, but stealing a bit-field from eCellType to denote a 'special' cell:

class ScBaseCell
{
protected:
	USHORT			nTextWidth;
	BYTE			eCellType : 7;  // enum CellType - BYTE spart Speicher
        bool                    bSpecial : 1;   // other information to be looked up elsewhere
	BYTE			nScriptType;

The 'bSpecial' flag could be used to denote that there is a 'note' for this cell (in a separate hash), or that this cell has a single-cell dependant. So - we can save 2/3rds of the base size with fairly little effort.

ScStringCell

In Excel similar strings across the sheet are shared. Our string class includes a sharable, immutable reference counted string - but it's not clear that we use this as intelligently as we should - pwrt. XML import / export. Possibly we should be trying to detect common strings there with a small back-hash.

ScFormulaCell

There are a number of problems here:

redundant data: apparently we store redundant data for values. Certainly we store the 'rendered' or computed value of the cell regardless of whether it is likely to be rendered.

ScFormulaCell inherits from svt/inc/listener.h - which has a virtual destructor, hence we have a vtable pointer per instance too (most likely unnecessary), as well as the listener list.

Document pointer - as in the ScEditCell structure we lug around a document pointer we should 'know' as implicit context.

Shared formulae - Excel will 'share' formulae - ie. very little state is duplicated if you fill a column 'D' with =((A1+B1)/C1)* SQRT(A1) or whatever. Calc by contrast will duplicate this formulae innumerable times. We need to extract immutable, position independant formula objects, reference count & share these; plus of course, elide duplicates on import. This would give a massive memory saving for large sheets - it's very common to share formulae.

Splitting Matrix pieces - perhaps possible to split: nMatCols, nMatRows (6bytes), pMatrix (4-8bytes), and perhaps cMatrixFlag (1byte) into a derived 'ScMatrixFormula' sub-class ? specific to matrix formulae: a sub-set of formulae.

In-sheet objects

With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample document.

The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie.

Large / complex pivot sheets

Jody has some sample data here - links ? - internal Novell sheet (commonly used) taking ~3 hours to re-compute.

threaded calculation

Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph & then thread the calcuation.

Chart Optimization

The existing chart component performs extremely poorly. However - it's likely that optimizing chart2 is a better bet. Having said that, there seem to often be a large number of re-calculation / re-renderings of charts on load that are perhaps realated to a calc/chart2 mis-interaction [jody- substantiate?]

In addition the chart likes to do (expensive) label / string size calculation for (potentially) tens of thousands of labels it will never use for scatter plots, multiple times each.

Calc/Performance/misc

Contents

Calc Optimization Opportunities

Cell size

ScBaseCell

ScStringCell

ScFormulaCell

In-sheet objects

Large / complex pivot sheets

threaded calculation

Chart Optimization

Views

Personal tools

Navigation

Search

Tools