Difference between revisions of "Performance/Reorder Symbols For Libraries"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Compiler options)
(Compiler options)
Line 43: Line 43:
 
</source>
 
</source>
  
The /Gh compiler option provides us the ability to be called by every function entry. The call to the hook function will be added by the compiler. That means we have to rebuild all libraries that should be instrumented with the /Gh option set. The /GH option is useful if we want to measure timing therefore we don't need it for record function calls. The /Gy options is needed to reorder the symbols by the linker.
+
The /Gh compiler option provides us the ability to be called by every function entry. The call to the hook function will be added by the compiler. That means we have to rebuild all libraries that should be instrumented with the /Gh option set. The /GH option is useful if we want to measure timing therefore we don't need it for record function calls. The /Gy options is needed to reorder symbols by the linker. Fortunately this options is set on official OpenOffice.org builds.
  
 
Looking at the documentation for the /Gh option Microsoft states that the hook function must be declared as naked. The function must also preserve all register content.
 
Looking at the documentation for the /Gh option Microsoft states that the hook function must be declared as naked. The function must also preserve all register content.

Revision as of 09:17, 26 March 2009

The comprehensive analysis of the cold start up behavior of OpenOffice.org shows that file I/O is the main bottleneck. About 80% of the start up time is spent waiting for data from the disk. Most file I/O depends on library loading. This part describes what can be done to reduce I/O time for loading OpenOffice.org libraries. The main ideas are system independent but the solutions must be system/compiler specific. The following chapters describe in detail how we want to reorder code/data within the libraries.

Main idea

Normally the compiler and linker produce a library which consists of many object files. The order of the code/data is dependent on the strategy of the linker and the layout of the library format. During the start up the application libraries are loaded on demand. Dependend on the program flow new code and therefore pages are loaded from disk into memory. Unfortunately the linker doesn't know how the application accesses every library during the start up phase. Therefore the needed code/data is distributed all over the library which causes many page faults and disk access.

Main Idea Library Optimization.png

System dependent solution

Windows

This chapter describes the solution for the Windows platform.

Microsoft Visual Studio 2008

OpenOffice.org uses the Microsoft Visual Studio 2008 C/C++ compiler suite for the Windows build, called wntmsci12[.pro]. Unfortunately Microsoft discontinued the Working Set Tuner application which was part of the Platform SDK. That application allowed developers to optimize the layout of application libraries. A successor called Smooth Working Set Tool is also not available for download.

So we have to look for a solution on our own. This has also the big advantage that the solution can be adapted to our needs. What options are available to support us reordering code/data in libraries? If you start the C/C++ compiler and linker with the help option you can see all supported options. The following section shows the options which can help us.

Compiler options

Microsoft (R) 32-Bit C/C++-Optimizing Compiler Version 15.00.30729.01 for 80x86
 
Copyright (C) Microsoft Corporation.  All rights reserved.
 
...
/Gh Enable _penter function call        
/GH Enable _pexit function call
/Gy Enable Function-Level Linking
...
 
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.
 
 Syntax: LINK [Options] [Files] [@Commandfile]
 
   Options:
      ...
      /ORDER:@Filename
      ...

The /Gh compiler option provides us the ability to be called by every function entry. The call to the hook function will be added by the compiler. That means we have to rebuild all libraries that should be instrumented with the /Gh option set. The /GH option is useful if we want to measure timing therefore we don't need it for record function calls. The /Gy options is needed to reorder symbols by the linker. Fortunately this options is set on official OpenOffice.org builds.

Looking at the documentation for the /Gh option Microsoft states that the hook function must be declared as naked. The function must also preserve all register content.

void __declspec(naked) _cdecl _penter( void );

A function declared with the naked attribute doesn't have prolog or epilog code. It enables a developer to write his own custom prolog/epilog code using the inline assembler. The following skeleton can be used for our target to record all function calls during the start up.

extern "C" void __declspec(naked) _cdecl _penter( void )
{
   _asm
   {
      push eax
      push ebx
      push ecx
      push edx
      push ebp
      push edi
      push esi
   }
 
   // TODO: Add code to determine the caller address and provide it
   // to a function which records the call.
 
   _asm
   {
      pop esi
      pop edi
      pop ebp
      pop edx
      pop ecx
      pop ebx
      pop eax
      ret
   }
}

What we have to do is to retrieve the address of the caller function. This can be done with a little calculation as the address is on the stack. See the following code.

extern "C" void __declspec(naked) _cdecl _penter( void )
{
    _asm
    {
        push eax
        push ebx
        push ecx
        push edx
        push ebp
        push edi
        push esi
 
        // calculate the pointer to the return address
        mov  ecx, esp
        add  ecx, 28
 
        // retrieve return address from stack
        mov  eax, dword ptr[ecx]
 
        // subtract 5 bytes as instruction for call _penter is 5 bytes long on 32-bit machines, e.g. E8 <00 00 00 00>
        sub  eax, 5
 
        // provide return address to recordFunctionCall
        push eax
        call forwardFunctionCall
 
        pop esi
        pop edi
        pop ebp
        pop edx
        pop ecx
        pop ebx
        pop eax
        ret
    }
}

The implementation of the _penter function provides the start address of the called function to an external function which can be implemented by C++ code.

Determine what functions are called during start up

With the help of the _penter function we are able to get the function addresses which are called during the start up. The _penter function calls an second function which can be implemented using C++. The function has to implement the following tasks:

  • Determine to which module the address belongs
  • Control a counter for every module which tag every new detected function with the current count. This gives us the opportunity to sort the function symbols related to their call sequence.
  • An access counter for every function to give us a chance to sort the function symbols related to their importance.
Use the map file information to map an address to a symbol

These information must be stored into trace files that can be analyzed by an additional tool. This tool will use the modules map file to determine the symbol from the address and it can also detect if the symbol is static or not. Static symbols cannot be moved by the linker.

 splmi
 
 Timestamp is 49c8eba8 (Tue Mar 24 15:18:16 2009)
 
 Preferred load address is 10000000
 
 Start         Length     Name                   Class
 0001:00000000 000128f0H .text                   CODE
 0001:000128f0 00003e58H .text$x                 CODE
 0001:00016750 0000010cH .text$yc                CODE
 0001:00016860 000000d3H .text$yd                CODE
 0002:00000000 000007e4H .idata$5                DATA
 0002:000007e4 00000004H .CRT$XCA                DATA
 0002:000007e8 0000001cH .CRT$XCU                DATA
 0002:00000804 00000004H .CRT$XCZ                DATA
 0002:00000808 00000004H .CRT$XIA                DATA
 0002:0000080c 00000004H .CRT$XIAA               DATA
 0002:00000810 00000004H .CRT$XIC                DATA
 0002:00000814 00000004H .CRT$XIZ                DATA
 0002:00000820 00001c40H .rdata                  DATA
 0002:00002460 0000004dH .rdata$debug            DATA
 0002:000024b0 000013d8H .rdata$r                DATA
 0002:00003890 0000038cH .rdata$sxdata           DATA
 0002:00003c1c 00000004H .rtc$IAA                DATA
 0002:00003c20 00000004H .rtc$IZZ                DATA
 0002:00003c24 00000004H .rtc$TAA                DATA
 0002:00003c28 00000004H .rtc$TZZ                DATA
 0002:00003c30 000044acH .xdata$x                DATA
 0002:000080dc 0000012cH .idata$2                DATA
 0002:00008208 00000014H .idata$3                DATA
 0002:0000821c 000007e4H .idata$4                DATA
 0002:00008a00 00005064H .idata$6                DATA
 0002:0000da70 000000b9H .edata                  DATA
 0003:00000000 000009e8H .data                   DATA
 0003:000009e8 00000628H .bss                    DATA
 0004:00000000 000000ecH .rsrc$01                DATA
 0004:000000f0 00000278H .rsrc$02                DATA
 
  Address         Publics by Value              Rva+Base       Lib:Object
 
 0000:00000000       __except_list              00000000     <absolute>
 0000:000000e3       ___safe_se_handler_count   000000e3     <absolute>
 0000:00009876       __ldused                   00009876     <absolute>
 0000:00009876       __fltused                  00009876     <absolute>
 0000:00000000       ___ImageBase               10000000     <linker-defined>
 0001:00000000       ??0SplashScreen@desktop@@AAE@ABV?$Reference@VXMultiServiceFactory@lang@star@sun@com@@@uno@star@sun@com@@@Z 10001000 f   splash.obj
 0001:000002e0       ??1OUString@rtl@@QAE@XZ    100012e0 f i splash.obj
 0001:00000300       ??_ESplashScreen@desktop@@EAEPAXI@Z 10001300 f i splash.obj
 ...
 
 entry point at        0001:00012246
 
 Static symbols
 
 0001:fffff000       __unwindfunclet$?copy@OUString@rtl@@QBE?AV12@JJ@Z$0 10000000 f   cfgfilter.obj
 0001:fffff000       __unwindfunclet$??0Exception@uno@star@sun@com@@QAE@ABVOUString@rtl@@ABV?$Reference@VXInterface@uno@star@sun@com@@@1234@@Z$0 10000000 f   migration.obj
 ...
 0001:00005f60       ?getSupportedServiceNames@@YA?AV?$Sequence@VOUString@rtl@@@uno@star@sun@com@@H@Z 10006f60 f   services_spl.obj
 0001:000068d9       ?_setBold@desktop@@YAXAAVFixedText@@@Z 100078d9 f   pages.obj
 ...

How to create an ORDER file that is accepted by the linker

There is no real documentation about the decoration schema Microsoft uses for their C++ compilers. A very comprehensive description can be found on the following Wikipedia page: http://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B_Name_Mangling.

Problems

There are some problems with the ORDER file and the linker.

  • The linker crashes reproducable if very long symbols are within the ORDER file. A symbol with 1670 character length works, a symbol with 3345 chars results in a crash. It must be verified if linker can order a symbol if it uses less characters.
  • The compiler uses a random number for every type that is declared in a anonymous or counted namespace. This number is newly created for every new compile process. Therefore these symbols cannot be used in ORDER files as the trace code needs an instrumented build which has its own random numbers.

Linux

MacOS X

The following web page from Apple describes what must be done to reorder code/data of a library to improvde locality.

Solaris

OpenOffice.org uses the Sun Studio C++ compiler suite for building on both Sparc and x86 CPU systems. The following web page describes what can be done to optimize the code layout of libraries with the Sun Studio C++ compiler suite.

First results

First tests on Windows with reordered symbols (all symbols which are needed during start up are sorted at the start of the library) shows that up to 40% less page faults could be reached (measured with Process Monitor). Look at the following table which provides numbers for some libraries. These are very early test results with prototype code, hopefully it can be further optimized. There are also some strange results which must be analyzed in more detail.

Module Page Faults (non-optimized) Page Faults (optimized) Time for all Read Operations (non-optimized) Time for all Read Operations (optimized) Improvement (Page Faults/Time)
swmi.dll 210 125 1378ms 866ms 37% / 37%
sfxmi.dll 110 86 912ms 659ms 22% / 28%
vclmi.dll 100 87 645ms 525ms 13% / 18%
fwkmi.dll 72 68 422ms 544ms 6% / -29% (*)
tlmi.dll 26 26 158ms 195ms 0% / -23% (*)

(*) It must be analyzed why the optimized versions of the libraries have a higher load time. Currently the cause is unknown.

Overall the load time for the first test set (8 optimized libraries) could be reduced from 3859ms to 3149ms. This is a 20% improvement.

Personal tools