Calc/Implementation/Spreadsheet Functions

From Apache OpenOffice Wiki
< Calc‎ | Implementation
Revision as of 16:02, 10 December 2010 by Dr (Talk | contribs)

Jump to: navigation, search

Procedure to add a new Calc spreadsheet function.


Let's assume you want to implement a new spreadsheet function and the function was defined by the OASIS OpenDocument Format Formula subcommittee, see latest revision of the specification (draft). Let's further assume the function's name is MYFUNC and will take 2 parameters, of which the second parameter is optional and defaulted to 0, and returns a number, as following:

Syntax: MYFUNC( Number Param1 [ ; Number Param2 = 0 ] )
Returns: Number

Make the formula compiler know the function


sc/inc/compiler.hrc up to and including version DEV300_m38

These are the defines used by the resources for function names and the Function Wizard, and the numerical values of OpCode for the formula compiler and interpreter. Note that once defined the names must not be changed because they are used by the localization tools as identifiers.

Add a new define, in this case for 2 parameters append it to the section for functions with more than 1 parameter near the end of the file. Name the define SC_OPCODE_MYFUNC and insert it right before the define of SC_OPCODE_STOP_2_PAR, give it the value SC_OPCODE_STOP_2_PAR had, and increment the values of SC_OPCODE_STOP_2_PAR and SC_OPCODE_LAST_OPCODE_ID. If before the section looked like

#define SC_OPCODE_STOP_2_PAR        393
#define SC_OPCODE_LAST_OPCODE_ID    392      /* last OpCode */

it should then be

#define SC_OPCODE_MYFUNC            393
#define SC_OPCODE_STOP_2_PAR        394
#define SC_OPCODE_LAST_OPCODE_ID    393      /* last OpCode */


sc/inc/opcode.hxx up to and including version DEV300_m38

Here the OpCodeEnum values are defined. Note that in a non-product build (--enable-dbgutil during configure) there is a typedef OpCodeEnum OpCode; to show enum names in the debugger, while in a product build it is typedef OpCodeEnum USHORT; to save some memory, since compilers tend to produce an int for an enum.

Find a "right" place for the new enum. Although the way the definitions are setup the placement doesn't matter, there are sections with different topics, such as String functions and Statistical functions. Maybe the correct place for MYFUNC would be under miscellaneous. Best practice is to add a new OpCode to the end of such section. Name the OpCode ocMyFunc and add the line

        ocMyFunc            = SC_OPCODE_MYFUNC,


sc/source/core/src/compiler.src up to and including version DEV300_m38

These are the resources for function names. There are 3 resource bundles:

English UI display names. These get localized for the UI of other languages.
These English names are used internally to store/load ODF v1.0/v1.1 and for API XFunctionAccess. Usually the name is identical to that in RID_SC_FUNCTION_NAMES. Once defined and "in the wild", the name must not be changed.
These English names are used internally to store/load ODFF aka OpenFormula as of ODF v1.2. Once defined, the name must not be changed.

The new function name must be defined for all 3 resource bundles.

To the end of

        Text [ en-US ] = "MYFUNC" ;

The [ en-US ] field tells the localization tools that the name may be localized.

To the end of

    String SC_OPCODE_MYFUNC { Text = "MYFUNC" ; };

The absence of the [ en-US ] field tells the localization tools that the name must not be localized.

To the end of

    String SC_OPCODE_MYFUNC { Text = "MYFUNC" ; };

Again, the absence of the [ en-US ] field tells the localization tools that the name must not be localized.

The compiler knows the function

After having added the necessary entries to formula/inc/formula/compiler.hrc, formula/inc/formula/opcode.hxx, and formula/source/core/resource/core_resource.src (sc/inc/compiler.hrc, sc/inc/opcode.hxx, and sc/source/core/src/compiler.src respectively), the formula compiler now knows the new function name and is able to compile an expression where it is used, and it can be stored in and loaded from a document. Of course nothing else works, the interpreter doesn't know how to handle it and will generate an error if encountered. The function and its parameters will not appear in the Function Wizard.

Publish the function to the Function Wizard


Till DEV300m90

Function groups (categories) and HelpIDs for functions are defined here. Lookup the group where functions of ID_FUNCTION_GRP_... matching the new function's category are defined and append an entry, incrementing the offset of the last entry by one. For our function that could be



Since DEV300m91.

With Issue 111874 (not issue 111784 as in comment in CWS changehid) the helpid system has been changed from numerical to byte string. The help ids are no longer defined in file scfuncs.hrc but in file helpids.h

For our function that could be

#define HID_FUNC_MYFUNC             "SC_HID_FUNC_MYFUNC"

Notice the prefix SC_. Use a position that is similar to the position of the new function in the other files like compiler.hrc or core_resource.src.


This large resource file contains all elements necessary to display functions in the Function Wizard. It defines the function's short description, the number of parameters, whether they are optional, and the description of each parameter. For a detailed description of fields see the comment on top of the file.

Add the new function to the end of one of the two resource blocks RID_SC_FUNCTION_DESCRIPTIONS1 or RID_SC_FUNCTION_DESCRIPTIONS2. Which one doesn't really matter, but functions should be more or less equally distributed over the two blocks. There are two blocks because one resource block couldn't have more than 64k data, just another legacy from Win16 times.. looked up again right now (2008-08-23) in the meantime this restriction seems to have been obsoleted, which would have to be verified by a test build using one resource block only though.

        String 1 // Description
            Text [ en-US ] = "Calculates foo, optionally using bar." ;
        ExtraData =
            U2S( HID_FUNC_MYFUNC );
            2;  0;  1;
        String 2 // Name of Parameter 1
            Text [ en-US ] = "Number" ;
        String 3 // Description of Parameter 1
            Text [ en-US ] = "A value for which foo is to be calculated." ;
        String 4 // Name of Parameter 2
            Text [ en-US ] = "Number" ;
        String 5 // Description of Parameter 2
            Text [ en-US ] = "The bar value." ;

Some comments on the ExtraData block, for more details see scfuncs.src:

ExtraData =
    0;                      // The function is not suppressed and available in UI.
    ID_FUNCTION_GRP_MATH;   // The category in which the function is displayed.
    U2S( HID_FUNC_MYFUNC ); // The HelpID of this function.
    2;  0;  1;              // 2 parameters, of which the 2nd is optional.
    0;                      // None of the parameters are suppressed in the UI.


Here the HelpIDs to be used within the Function Wizard are propagated to the help system. Go to the end of the section containing HID_FUNC_... and append an entry,

hidspecial HID_FUNC_MYFUNC          { HelpID = HID_FUNC_MYFUNC; };

The Function Wizard knows the function

Now the Function Wizard can display the function and its parameters, and online-help may be authored.

Let the interpreter handle the function


Add a member method to class ScInterpreter that will handle the new function. Here this would be

void ScMyFunc();

Take care that the new member function doesn't resemble the name of some already existing class, for example the method for the ADDRESS() function is named ScAddressFunc() because ScAddress(), if called without this-> prefix, would be the ctor of class ScAddress instead. If in doubt, add ...Func() to the name.


In method ScInterpreter::Interpret() add to switch ( eOp ) the call to the member function for the OpCode:

                case ocMyFunc           : ScMyFunc();                   break;


Pick one of the interpr?.cxx source files where the new method may fit. There is no general advice which file exactly that might be, be sensible. As a guide line

Basic math functions and informational functions.
Date functions and financial functions.
Statistical functions.
Interpreter managing and stack related functions, you shouldn't need to add anything there.
Matrix functions.
Some number crunching that in the past had to be compiled without optimizations, though in the mean time code changed and this probably is not necessary anymore.

For our function, since that expects two, one optional, numerical scalar arguments, this would be:

void ScInterpreter::ScMyFunc()
    BYTE nParamCount = GetByte();
    // The MustHaveParamCount...() functions check the number of parameters and
    // if they do not fit push an error on the stack, if the method fails
    // (returns false) we return immediately.
    if ( !MustHaveParamCount( nParamCount, 1, 2 ) )
    // Arguments are popped from the stack from right to left.
    double fParam2;
    if (nParamCount == 2)
        fParam2 = GetDouble();
        fParam2 = 0.0;
    double fVal = GetDouble();
    if ( /* does fVal meet all constraints */ )
        double fResult = /* calculate foo */ ;
        PushDouble( fResult );

The not so easy case of non-scalar arguments

StackVar argument types

If the new function would handle parameters that are not scalar values, for example a NumberSequence or matrix/array, they would have to be treated explicitly, checking and reacting on the type of each argument. Lookout for functions that use the GetType() call and handle StackVar svDoubleRef or similar. Ask on the dev@sc mailing list if in doubt.

The most common StackVar types obtainable with GetStackType()are:

A double value.
A literal string text.
A single cell reference.
A cell range reference.
A list of cell range references.
A matrix/array.

There are a few other types available with GetRawStackType() for specific situations, these are converted to svDouble in GetStackType() and returned as 0.0 by PopDouble() and GetDouble():

An empty parameter without value, if that should not be converted to 0.0 use GetDoubleWithDefault(double).
A previous expression returned an empty cell, it may depend on context whether that is to be interpreted as double or string.

The general template to act on each parameter in sequence from right to left is:

    // Note: use short instead of BYTE when doing the (nParamCount-- > 0) loop below.
    short nParamCount = GetByte();
    if (MustHaveParamCountMin( nParamCount, 1))     // In case your function needs at least one parameter.
        size_t nRefInList = 0;
        while(nParamCount-- > 0)
            if (nGlobalError)
                switch (GetStackType())
                    case svDouble :
                            double fVal = PopDouble();
                            if (!nGlobalError)
                    case svString :
                            String aStr( PopString());
                            if (!nGlobalError)
                    case svSingleRef :
                            ScAddress aAdr;
                            PopSingleRef( aAdr);
                            if (!nGlobalError)
                                ScBaseCell* pCell = GetCell( aAdr);
                                if (HasCellValueData( pCell))
                                    double fVal = GetCellValue( aAdr, pCell);
                                    if (!nGlobalError)
                                    String aStr;
                                    GetCellString( aStr, pCell);
                                    if (!nGlobalError)
                    case svDoubleRef:
                    case svRefList:
                            ScRange aRange;
                            PopDoubleRef( aRange, nParamCount, nRefInList);
                            if (!nGlobalError)
                                double fVal;
                                USHORT nErr = 0;
                                ScValueIterator aValIter( pDok, aRange );
                                if (aValIter.GetFirst( fVal, nErr))
                                        if (nErr)
                                    } while (aValIter.GetNext( fVal, nErr));
                                SetError( nErr);
                    case svMatrix:
                            ScMatrixRef pMat = GetMatrix();
                            if (pMat)
                            // else: GetMatrix did set errIllegalParameter
                        SetError( errIllegalParameter);
        Push...(...);  // Don't forget to push a result ;-)

Of course your function may require specific treatment of parameters depending on parameter types of other parameters, or bail out if a parameter type does not match an expected type, this template is meant as guidance only.


Describes how parameters are to be treated if the function is used in an array context expression, AKA array formula, where cell range references may be treated differently from non-array context. If the function does not accept other than scalar arguments, nothing has to be done here and the function does not need an entry. Else the type of each parameter has to be set to one of:

Function expects a single scalar value.
A cell range reference is passed as reference and not converted to array.
In array context a cell range reference is converted to array.
A cell range reference is always converted to an array and the ForceArray context is propagated to all functions and operators in this parameter's expression.
A cell range reference is not converted to an array, but a ForceArray context is propagated to all functions and operators in this parameter's expression. Only LOOKUP() uses this, other functions should not need it.

For a more detailed description of these types please read the comments for ScParameterClassification::Type in sc/source/core/inc/parclass.hxx. Ask on the dev@sc mailing list if in doubt.

Set the ScParameterClassification::CommonData.bRepeatLast member variable to true if the function accepts multiple parameters that are of the same type as the last one specified, e.g. for SUM() that accepts multiple Reference parameters. If the function does not accept multiple parameters of the same type, set the value to false.

Microsoft Excel® import and export

If the new function is also supported by Microsoft Excel®, for import/export it has to be added to the filter code as well. supports various versions of the file format used by Excel.

  • Currently, import and export of the old binary formats (file extension ".xls", internal names "BIFF2" for Excel 2.x, "BIFF3" for Excel 3.0, "BIFF4" for Excel 4.0, "BIFF5" for Excel 5.0 and Excel 95, and "BIFF8" for Excel 97-2003) are implemented in the sc code module in the directory sc/source/filter/excel. While the import filters support all BIFF versions, there are export filters for BIFF5 and BIFF8 only.
  • Import of the new Office Open XML formats (file extensions ".xlsx" and ".xlsm" used by Excel 2007 and later, internal name "OOXML") and the new binary format (file extension ".xlsb" used by Excel 2007 and later, internal name "BIFF12") are implemented in the oox code module in the directory oox/source/xls.

In the future, it is planned to move the old BIFF2-BIFF8 filters to the oox code module too, to be able to share more source code for all filters. The import of functions and formulas will take benefit from that too.

Old binary file formats (BIFF2-BIFF8)


Import and export of BIFF formulas is spread over a number of files in the sc code module.

  • Source code used by the import and export filters is located in the files sc/source/filter/inc/xlformula.hxx and sc/source/filter/excel/xlformula.cxx. The cxx file contains the most important detail for functions: the function tables, which will be described in detail below.
  • The source code to read formulas from BIFF is located in sc/source/filter/inc/excform.hxx, sc/source/filter/excel/excform.cxx (for BIFF2-BIFF5), and sc/source/filter/excel/excform8.cxx (for BIFF8). Note that these source files have not been touched seriously for more than a decade, so the code quality might not be very high.
  • The source code to write formulas to BIFF is located in sc/source/filter/inc/xeformula.hxx and sc/source/filter/excel/xeformula.cxx.

It is important to know the BIFF version the function has been added to. Some functions are not supported by very old versions of Excel, and some functions even change the number of supported parameters between BIFF versions.

The function tables - Basics

The file sc/source/filter/excel/xlformula.cxx contains a function table for each BIFF version, and an additional table for functions unknown to Excel. Every line in these tables describes a single spreadsheet function and is of type XclFunctionInfo as defined in sc/source/filter/inc/xlformula.hxx. If a function is added in a table for a specific BIFF version, it will be used for all newer BIFF versions too, there is no need to repeat the function description in the following tables (unless the function changes somehow in a specific BIFF version).

We will continue to take a look at our example function MYFUNC. We will assume that this function has been introduced in BIFF3 with only the first (required) parameter, and that the second optional parameter has been added in BIFF8. Furthermore, we will assume that the function is built-in in Excel and has the BIFF function identifier 200 (later on this page, handling of add-in functions will be described too).

Because the function changes in BIFF8 (new optional second parameter), we have to add an entry in the function table for BIFF3, and another entry in the function table for BIFF8 which will override the former entry, if the imported/exported file format is BIFF8. First, we have to add a new line in the function table for BIFF3, called saFuncTable_3. By convention, the tables are sorted by BIFF function identifier (second column in the tables), here we should use the appropriate position for our BIFF function identifier 200.

  • The first entry in the line is the internal op-code of the function, which is ocMyFunc here:
    { ocMyFunc,
  • The second entry in the line is the BIFF function identifier:
    { ocMyFunc, 200,
  • The third entry in the line is the minimum parameter count of the function. MYFUNC expects at least one parameter:
    { ocMyFunc, 200, 1,
  • The fourth entry in the line is the maximum parameter count of the function. MYFUNC in BIFF3 expects at most one parameter:
    { ocMyFunc, 200, 1, 1,
  • The fifth entry in the line is the type of the value the function returns. Usually, a function returns a single value (scalar), which might be a number or a string. But a few functions return an array of values (e.g. the MTRANS function that transposes a matrix), or a cell range address (e.g. the INDIRECT function that converts a string to a range address). Allowed values for this entry are V for scalar values, A for arrays, and R for range addresses (references). MYFUNC returns a scalar value, thus we have to add the V type:
    { ocMyFunc, 200, 1, 1, V,
  • The sixth entry in the line is a C array describing the type of all function parameters. This is the most complicated part of the function description, as sometimes the parameter type is not obvious. Simple functions taking scalar values usually will have parameters of type VR which is short for "value/repeated". See the inline documentation of the structure XclFuncParamInfo in the file sc/source/filter/inc/xlformula.hxx for more details. Our function takes one scalar value parameter:
    { ocMyFunc, 200, 1, 1, V, { VR },
  • The seventh entry in the line contains additional flags that control the behaviour of the formula filters. Later on, these flags will be described. Our function does not need any special behaviour:
    { ocMyFunc, 200, 1, 1, V, { VR }, 0,
  • The eighth entry in the line is a C string specifying an alternative function name to be used in BIFF for add-in functions. If not used, it can be set to null:
    { ocMyFunc, 200, 1, 1, V, { VR }, 0, 0 },

That's it. The function will now be imported correctly from BIFF3, BIFF4, and BIFF5 files. Due to the fact that BIFF8 adds an optional parameter (which changes the binary representation of the function), import from and export to BIFF8 will not work yet. We have to add an entry in the function table for BIFF8, called saFuncTable_8. Again, we should insert the new line according to the BIFF function identifier.

  • The first difference in the new line is the number of supported parameters. The function still requires one parameter, but it also accepts two parameters. Thus, we have to change the fourth entry from 1 to 2:
    { ocMyFunc, 200, 1, 2, V, { VR }, 0, 0 },
  • Next, we have to add the type of the second parameter. It is also of type VR:
    { ocMyFunc, 200, 1, 2, V, { VR, VR }, 0, 0 },
  • As an optimization or simplification, it is allowed to leave out the parameter type of trailing parameters, if they are equal to the type of their predecessor. In this case, we may have left the line as following:
    { ocMyFunc, 200, 1, 2, V, { VR }, 0, 0 },

This is especially useful for functions that take more arguments, e.g. the REPLACE function that takes 4 parameters of type VR:

    { ocReplace, 119, 4, 4, V, { VR }, 0, 0 },

Now we are finished. The import and export filters are fully aware of the new function MYFUNC. On export, if the BIFF version does not support a function, or if the number of arguments is not supported by the current BIFF version, the export filter will replace the entire function with NA(). For example, the BIFF8 export filter will be able to write the formulas =MYFUNC(1) and =MYFUNC(1;2), but the BIFF5 export filter will replace the function with NA() in the second formula because two parameters are not supported here.

The function tables - Advanced

New file formats (OOXML and BIFF12)

As if that wasn't enough, nearly the same has to be added to the new filter for Microsoft Excel® that shares some mechanisms with the new Microsoft Office Open XML (MOOXML) filter. The implementation in sc/source/filter/{excel,xcl97}/ will be deprecated later. For details of the new tables see declaration of struct FunctionInfo in oox/inc/oox/xls/formulabase.hxx.

Personal tools