Hacking Calc - The First Step

From Apache OpenOffice Wiki
Revision as of 12:28, 16 December 2011 by TJFrazier (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

So, you have decided that you want to hack OpenOffice.org (OO.o) Calc to fix a bug that's been annoying you to death, or to enhance a functionality that you believe is not as good as it can be, or perhaps to improve performance and reduce memory footprint. Whatever your reason may be, you need to know where to start if you don't have any prior experience with OO.o's codebase. The best way to become familiar with the codebase is to simply do a lot of experimentation; modify the code and see what change it makes. There is probably a lot of trials and error involved at first, but hopefully this guide will provide you first-time OO.o hackers with help on modifying Calc to achieve some cool stuff.

Who this guide is for

This guide is intended for C++ programmers who are already familiar with how to download the source code from CVS and do a complete build, and perhaps to do a rebuild of an individual module. If you aren't, then the Tools project homepage may be a good starting point. Don't forget to visit the infamous Hacking page to get a general overview of how the build system works, and how to work with it.

This guide is, however, not intended for someone who wants to develop an add-on component to OO.o by using the UNO component technology. There are a number of good articles on that all over the web, but perhaps you may want to start with Uno/Articles&Tutorials first.

Brief Summary of Calc's Class Structure

Document and View

Calc's class structure is built around two core classes named ScTabView and ScDocument, which, as somewhat evident in their class name, represent Calc's view and document components, respectively. If you are familiar with the concept of model-view-controller (MVC) architecture, this architectural pattern probably sounds familiar. The controller component of the MVC architecture is absent in Calc's class structure, however, as its role is covered by the underlying graphic sub-system layer (vcl module) where user inputs (mouse and keyboard inputs) are captured and sent to each application as event objects.

Another important class to note here is ScViewData, whose primary role is to serve as the persistent storage of view data, with the secondary role being the liaison between the document and view classes. Since ScTabView is the owner of the only instance of ScViewData, all of its child classes (ScViewFunc, ScDBFunc, and ScTabViewShell) can get easy access to ScViewData by simply calling their inherited GetViewData() call.

The following diagram outlines the relationships between the main view classes which are subclasses of ScTabView, the ScViewData class, and the ScDocument class. The ScDocument class is the top-most class of the document class hierarchy, which is discussed in detail in the next section. The vertical arrows indicate class inheritance, whereas the horizontal arrows indicate usage with a getter function name.

Relationship between ScViewData, ScDocument, and the subclasses of ScTabView.

Document Structure

ScDocument class represents the spreadsheet document as a whole, and it owns a static-size array of pointers to ScTable instances which are allocated dynamically on the heap. Internally, Calc's 2-dimensional sheet structure is column-first and row-second; the sheet is first partitioned into a fix number of columns (256 columns as of version 2.0), and each column instance has a varying number of cell instances.

While memory is allocated statically for the column instances when a ScTable object is instantiated, individual cell instances are dynamically allocated or deallocated as they are filled or emptied. Now, the allocated cells that are contiguous in memory do not necessarily represent contiguous cells on screen; even if the non-empty cells are sparsely positioned on screen (e.g. cells at row 1 and 100 contain data, but the cells at rows 2-99 are empty), their locations in memory are contiguous, and Calc uses a specialized lookup algorithm to retrieve its memory location from the row ID of a cell. See ScColumn::Search() method for the actual implementation of this lookup algorithm (located in source/core/data/column.cxx).

This mechanism helps to keep the size of memory footprint low for a sheet with only sparsely-populated cells, independent of their row position (i.e. it makes no difference memory-footprint-wise whether the non-empty cell is at row 1 or at row 65535).

Every non-empty cell can be one of five types: value cell, string cell, formula cell, note cell, and edit cell. These cell types are represented by the following classes, respectively: ScValueCell, ScStringCell, ScFormulaCell, ScNoteCell, and ScEditCell. All of these classes are derived from the common base class named ScBaseCell, so that each ScColumn object can store any types of cells simply by storing them as ScBaseCell. No virtual methods are used in the ScBaseCell to avoid a large per-cell virtual function pointer. Instead the effect is achieved with a small type ID (enum eCellType data member of ScBaseCell). See the ScBaseCell::Delete() method for an example of psuedo-virtual method using the type ID.

Built-in Cell Functions

The majority of Calc's built-in functions are implemented in class ScInterpreter, though some are implemented in the separate scaddins module. This class is instantiated by ScFormulaCell to evaluate a function, or functions, typed into the cell formula. Class ScTokenArray is used to break up a nested formula expression into separate tokens so that the tokens are evaluated in the right RPN order.

Calc's document model structure

Hacking the code

Part 1 – Modify built-in cell function

Now that we've covered the basics of Calc's internal structure, let's start getting into the code and start making some changes. First, change directory into sc/source/core/tool/ and open the file named interpr1.cxx. Do a string search in your favorite editor to find the following function
void ScInterpreter::ScPi()
{
    RTL_LOGFILE_CONTEXT_AUTHOR( aLogger, "sc", "er", "ScInterpreter::ScPi" );
    PushDouble(F_PI);
}

This is the implementation of the built-in function PI(), which normally returns an excellent approximation of the famous constant. Let's change it to match some eventual legal requirement

void ScInterpreter::ScPi()
{
    RTL_LOGFILE_CONTEXT_AUTHOR( aLogger, "sc", "bill246", "ScInterpreter::ScPi" );
    PushDouble(3.2);
}

then save the file and re-build the sc module. After you replace the existing sc shared library named libsc.so in the install directory with the new one you just built, start up Calc and type =PI() you should get the return value as mandated by bill246.

Let's do another one while we're on the roll.

Part 2 – Adding callback to mouse click event

This time, we are going to intercept a triple-click mouse event and do something for that event. Let's make it so that it will select all cells from A1 to the current cursor position when triple-clicked. We'll call this new feature a "triple click select".

Open the file located in sc/source/ui/view/gridwin.cxx, and do a string search to locate the method named HandleMouseButtonDown(...) of class ScGridWindow. Class ScGridWindow is a derived class of VCL's Window class, and represents the outermost Calc window. It receives all keyboard and mouse events and makes appropriate event callbacks. There is normally only one instance of ScGridWindow, but there can be multiple instances when the view is split.

About 40 lines into the definition of this method you'll see the following if-statement block [cpp,N] if (pScMod->IsModalMode(pViewData->GetSfxDocShell())) {

   Sound::Beep();
   return;

} Let's add the following block of code after this block [cpp,N] if ( rMEvt.GetClicks() == 3 ) {

   // Triple-click received.
   pViewData->GetView()->TripleClickSelect();
   return;

} With this code, a triple mouse-click event will get intercepted and the method named TripleClickSelect() will get called to handle the event. Since the method GetView() returns a pointer to the instance of ScDBFunc class, all we need to do now is to implement TripleClickSelect() in one of ScDBFunc's base classes so that that method will be visible to ScDBFunc. Since this triple-click-select feature is not exactly a DB related functionality, we'll add it to its immediate base class ScViewFunc.

First we need to add its prototype to the header file of ScViewFunc. The header file is located in sc/source/ui/inc/viewfunc.hxx. Open it in your editor, and add the following declaration [cpp,N] void TripleClickSelect(); and make sure that it's a public method because we need to have another class calling it. Ok. We now have the prototype. So let's add its definition. All method definitions of ScViewFunc are found in files named viewfunc.cxx, viewfun2.cxx, ..., viewfun7.cxx. Though it makes absolutely no difference in which file to add the definition of TripleClickSelect, let's just pick the last file in the series. Change directory into sc/source/ui/view and open viewfun7.cxx. Once open, move to the very end of the file and add the following code [cpp,N] void ScViewFunc::TripleClickSelect() {

   ScViewData* pViewData = GetViewData(); // get pointer to ScViewData
   // get the position of the current cursor
   SCTAB nTab = pViewData->GetTabNo();    // get sheet index
   SCCOL nCol = pViewData->GetCurX();     // get column index
   SCROW nRow = pViewData->GetCurY();     // get row index
   ScMarkData& rMark = pViewData->GetMarkData();
   if ( rMark.IsMarked() )
   {
       // Don't mark twice if already marked.
       ScRange aMarkRange;
       rMark.GetMarkArea( aMarkRange );
       if ( aMarkRange == ScRange(0, 0, nTab, nCol, nRow, nTab) )
           return;
   }
   // Do the selection.
   DoneBlockMode(); // finish previous selection if any
   InitBlockMode( 0, 0, nTab );
   MarkCursor( nCol, nRow, nTab );
   SelectionChanged();

} When you're done, save the file, rebuild the sc module and swap the shared library as you did in Part 1. Now, start up Calc and see what a triple-click on a randomly-picked cell does. Pretty cool, huh?

Now that we've implemented this nifty new feature, let's go over this code to see how we actually implemented it First and foremost, we need to know the current position of the cursor. Since ScViewData keeps track of current cursor positions, we need to get a pointer to the instance of ScViewData first. [cpp,N] ScViewData* pViewData = GetViewData(); // get pointer to ScViewData Now that we have the pointer, let's query the current position of the cursor. Here, you need to get three values; the column position, row position, and the sheet index, all of which are available from ScViewData. [cpp,N] // get the position of the current cursor SCTAB nTab = pViewData->GetTabNo(); // get sheet index SCCOL nCol = pViewData->GetCurX(); // get column index SCROW nRow = pViewData->GetCurY(); // get row index SCTAB, SCCOL, and SCROW are integer data types meant for sheet index, column index, and row index, respectively. As of 2.0.2, SCTAB and SCCOL are typedef's to 16-bit integer, and SCROW is typedef to 32-bit integer.

The next if-statement block is there to avoid redundant selection when the desired cell range is already selected. [cpp,N] ScMarkData& rMark = pViewData->GetMarkData(); if ( rMark.IsMarked() ) {

   // Don't mark twice if already marked.
   ScRange aMarkRange;
   rMark.GetMarkArea( aMarkRange );
   if ( aMarkRange == ScRange(0, 0, nTab, nCol, nRow, nTab) )
       return;

} Notice how we query the ScViewData instance again for a currently selected cell range. If there is already a selected cell range, we check its position and size to see if it matches the geometry of our target cell range. If it does, then simply return to the calling function because we don't want to select the same range twice.

The last code block does the actual highlighting of a cell range. Let's go over it line by line. [cpp,N] // Do the selection. DoneBlockMode(); // finish previous selection if any InitBlockMode( 0, 0, nTab ); MarkCursor( nCol, nRow, nTab ); SelectionChanged(); The first call DoneBlockMode ends a pre-existing selection (if any). This is a standard Calc behavior; when a new selection is made, the previously selected region must be un-selected unless the lock selection mode is on (i.e. Ctrl key is pressed). The next call InitBlockMode initiates a block selection mode with the anchor set to A1, and the following MarkCursor call moves the cursor to the current cursor location with the block selection mode still on. The last call SelectionChanged notifies Calc that the block selection is complete so that Calc can now perform post-processing operations associated with a cell-selection change.

That's it. We're done!

Now it's your turn!

If you've followed this guide up to this point, then you probably have a good glimpse of what it's like to hack this fabulous spreadsheet program. Although we did not cover all aspects of Calc's internals due to space constraints, we're hopeful that you've gained enough hands-on experience to get a good jump start. Now it's your turn to extend Calc into an even more fabulous spreadsheet program!

If you have any questions, be sure to ask us on dev@sc.openoffice.org. Hope to see you there!

About the author

The initial author of this document is Kohei Yoshida (C) 2006. He is a software engineer at SlickEdit, a company that develops a cross-platform & multi-language editor for power programmers. He is also the current maintainer of the Optimization Solver for Calc. He can be reached at kohei@openoffice.org.

Personal tools