String Handling in Formula Compiler
From Apache OpenOffice Wiki
< Calc | Performance
Compiling formulas almost unconditionally calls toUpper() on every token parsed and spends way too much time in the underlying i18n routines.
- No need to do this at all for tokens of operators, separators, parentheses, ... all tokens that do not involve letters.
- When loading ODF documents, only a simplified ASCII toUpper() needs to be called, since all function names are stored using English names.
A test case document is attached to Issue 99828 , containing two columns of functions, 64k rows of formulas each, with a function name, some references, a value and a few operators and separators. Profiling gave
Level Method Instr. (incl.) % Called --------------------------------------------------------------------- Application::Execute 19,000,207,876 ScDocShell::LoadXML 18,500,074,143 1 ScFormulaCell::CompileXML 5,119,853,402 26.9 131,072 2 ScCompiler::CompileString 3,664,394,563 19.2 131,072 3 ScCompiler::NextNewToken 3,171,730,381 16.6 983,040 4 String::~String 276,824,113 1.5 1,441,792 4 CharClass::toUpper 948,874,151 5.0 589,824
After having eliminated unnecessary toUpper() calls and rearranged things a bit for less temporary strings the results were
Application::Execute 17,961,981,399 ScDocShell::LoadXML 17,464,830,027 1 ScFormulaCell::CompileXML 4,110,890,473 22.8 131,072 2 ScCompiler::CompileString 2,641,827,274 14.6 131,072 3 ScCompiler::NextNewToken 2,149,151,806 11.9 983,040 4 String::~String 171,048,974 0.9 983,040 CharClass::toUpper 0 0.0 0
which is an overall improvement of roughly ~5% under LoadXML().
Implemented in CWS DEV300 calcperf04