String Handling in Formula Compiler

From Apache OpenOffice Wiki
Jump to: navigation, search


Compiling formulas almost unconditionally calls toUpper() on every token parsed and spends way too much time in the underlying i18n routines.

  • No need to do this at all for tokens of operators, separators, parentheses, ... all tokens that do not involve letters.
  • When loading ODF documents, only a simplified ASCII toUpper() needs to be called, since all function names are stored using English names.

A test case document is attached to Issue 99828 , containing two columns of functions, 64k rows of formulas each, with a function name, some references, a value and a few operators and separators. Profiling gave

Level   Method                      Instr. (incl.)     %       Called
---------------------------------------------------------------------
        Application::Execute        19,000,207,876
        ScDocShell::LoadXML         18,500,074,143
1       ScFormulaCell::CompileXML    5,119,853,402  26.9      131,072
2       ScCompiler::CompileString    3,664,394,563  19.2      131,072
3       ScCompiler::NextNewToken     3,171,730,381  16.6      983,040
4       String::~String                276,824,113   1.5    1,441,792
4       CharClass::toUpper             948,874,151   5.0      589,824

After having eliminated unnecessary toUpper() calls and rearranged things a bit for less temporary strings the results were

        Application::Execute        17,961,981,399
        ScDocShell::LoadXML         17,464,830,027
1       ScFormulaCell::CompileXML    4,110,890,473  22.8      131,072
2       ScCompiler::CompileString    2,641,827,274  14.6      131,072
3       ScCompiler::NextNewToken     2,149,151,806  11.9      983,040
4       String::~String                171,048,974   0.9      983,040
        CharClass::toUpper                       0   0.0            0

which is an overall improvement of roughly ~5% under LoadXML().

Implemented in CWS DEV300 calcperf04  

Personal tools