Calc/Implementation/Data Model for sheet and cell

From Apache OpenOffice Wiki
Jump to: navigation, search

The relationship between document, table, column and cell

From Spreadsheet user's view, a Spreadsheet document is composed by several tables. Each table contains many cells which are organized by row and column. The codes just reflect this fact. ScDocument(Spreadsheet document) has a one dimension fix size(MAXTABCOUNT=256) array member. Each element in the array is a pointer to a ScTable. But ScTable does not contain a two dimension array, which covers every cells, since most of cells are empty. If every cell is stored in table, it need huge memory to contain 1G(1M rows * 1024 columns) cell objects/pointers for one table. Cells are organized in column. ScTable has a fix size (MAXCOLCOUNT=1024) array member. Each element in the array is a ScColumn object. ScColumn only stores useful cells, which is not empty cell or empty cell is referenced by other cell directly. ScColumn uses a dynamic array ScColumn::pItems to store cells.

Column, the building block of Spreadsheet

Cell is the basic element in a Spreadsheet. There are two kinds of data structure related with cell. One is related with content, ScBaseCell(and its sub). The other is related with property, ScPatternAttr. ScColumn does not only store cell content, but also store cell property. ScColumn::pItems is the a dynamic array holds cells. The sequence of cells in ScColumn::pItems is by their row number from small to big. The element in the array is the object of ColEntry, which hold a ScBaseCell pointer and corresponding row number(nRow). ScColumn::nCount is the number of cells stored in the column. ScColumn:nLimit is size of the dynamic array which does not need to resize to hold more cells. Please refer to ScColumn::Insert(SCROW nRow, ScBaseCell* pNewCell), ScColumn::Delete(SCROW nRow) and ScColumn::Append(SCROW nRow, ScBaseCell* pNewCell) to get more info about how this dynamic array work. Users can use index to retrieve the cell(pItems[index]). The index can be found by ScColumn::Search(SCROW nRow, SCSIZE& nIndex) from a row number. Every cell has properties. But not every cell's properties are stored as an individual object. If adjacent cells have same properties, they use one ScPatternAttr object. ScAttrArray is the data structure to hold cell's properties in one column. ScAttrArray::pData is the dynamic array holds properties. ScAttrArray::nCount and ScAttrArray::nLimit are similar to other dynamic array. The element in the array is the object of ScAttrEntry, which hold a ScPatternAttr pointer and a related row number info(nRow). The nRow indicate the last row in an adjacent cell range which has same properties. For an example like following, the row from 7 to the end of column are same.


The initial status of ScAttrArray has one element in pData, which pData[0].row is MAX_ROW, pData[0].pPattern is default pattern. That means whole column has default property. When user set cell's property, it may divide the range into pieces and add new ScAttrEntry. Sometimes, setting cell's property may merge adjacent ScAttrEntry into one because they have same properties. Please refer ScAttrArray::SetPatternArea(...) for more detail.

SfxPoolItem, SfxItemPool and SfxItemSet

From object property's view, a SfxPoolItem represents one property of an object. For an example, SvxBrushItem(sub of SfxPoolItem) represents background of an object. The object can be a cell, a drawing object, etc. SvxFontHeightItem represents font size of character. Every SfxPoolItem has a nWhich as its identity. Every object has several properties. For complex object, it may have tens of properties. For an example, Spreadsheet cell can have 53 properties. There are thousands of objects in one document. If the document model holds every object's every properties independently, it must use huge of memory. So there is a pool to hold the SfxPoolItems, SfxItemPool. Let's take ScDocumentPool as example. A SfxItemPool can hold many designated(nWhich id is between nStart and nEnd) SfxPoolItems. It has 2 level pointer array. The first level array is used to hold different kinds of SfxPoolItem. The second level array is used to hold different instance of one SfxPoolItem. SfxItemPool has a member pImp points to SfxItemPool_Impl structure, which has a member ppPoolItems points to a pointer array(blue part). For ScDocumentPool, it can be considered that ScDocumentPool has a pointer array to hold SfxPoolItems which nWhich ids are from 100 to 188. Every pointer in the array points to another structure, SfxPoolItemArray_Impl. SfxPoolItemArray_Impl has a member, pData, which points to a SfxPoolItem pointer array(green part). Every element in this array is a pointer points to real SfxPoolItem object.


The SfxPoolItems are different in the pool. When user set a property of one object, the corresponding SfxPoolItem should be set to the object. It will check whether this SfxPoolItem is existed in the pool before put it in the pool. If it is existed, it will return a pooled SfxPoolItem to set to the object. If it is not existed, it will be added in the pool. So every SfxPoolItem object is distinct from each other. Please refer SfxItemPool::Put(const SfxPoolItem rItem, USHORT nWhich) for detail info. All obejcts' properties are put into the pool. Because there is only one instance of SfxPoolItem for one value in the pool, and because most object has similar properties, the memory will not be taken too much though there are many objects which has many properties. The SfxPoolItem is shared by many object. The life cycle of SfxPoolItem is not same as the related object. It uses reference count. When get one SfxPoolItem from pool to set to one object, the reference of this SfxPoolItem object should be plus one. When remove one property from object, the reference of this SfxPoolItem object should be minus one. Please refer SfxItemPool::Put(const SfxPoolItem rItem, USHORT nWhich) and SfxItemPool::Remove(const SfxPoolItem& rItem)for detail info. SfxItemSet is a set of SfxPoolItem. Which SfxPoolItem can be put into the set is decided when construct the SfxItemSet. SfxItemSet::_pWhichRanges is used to specify the ranges for nWhich ID. SfxItemSet::_aItems is a pointer array. Each pointer points the real SfxPoolItem object in the pool.

ScPatternAttr, SfxSetItem and SfxItemSet

SfxSetItem is sub class of SfxPoolItem. It is a kind of SfxPoolItem, which just wrap a SfxItemSet. ScPatternAttr is sub class of SfxSetItem. It represents cell's attributes, including text attributes, number format attributes, background attributes, border attributes, etc. Most SfxPoolItems do not have many instant because a specific property does not have many different value in one document. But for ScPatternAttr, it can have maximum 53 sub attributes. It may have a lot of instances, which make size of SfxPoolItemArray_Impl::pData large. So when put a ScPatternAttr in the pool, it need a lot of comparison. Each comparison for two ScPatternAttr need compare all its sub attributes. It is time consuming, especially for putting an existed ScPatternAttr into the pool. Developer need pay more attention for this potential performance bottleneck.

Personal tools