Configmgr Refactoring

From Apache OpenOffice Wiki
Jump to: navigation, search
Documentation note.png The configmgr has been re-implemented for OpenOffice.org 3.3, see Performance/Configuration. This page describes an alternative approach that has not been implemented, so it is somewhat obsolete.

Summary

The configmgr is considered a well-known not-efficient module. Many engineers refactored and tuned it at some aspects, some efforts achieved better performance and some not. And until now, it is not efficient enough. So we are trying to refactor it by a new concept, that is, pulling most of operations(such as layer merging) out from application runtime to compiling or packaging time. We make the config data, which is got by dynamic merging layer at runtime, more static. So we can get the config data directly from single XML(or binary) file.

For this concept, we modified some requirements for configmgr and this did not influence the application functionality.

Understand Requirement

Requirements of configmgr:

  1. Availability on all supported platforms of OpenOffice.org.
  2. Capability of efficient storing and retrieval of thousands of configuration key-value pairs.
  3. Support for multiple backends, e.g. file-based or LDAP.
  4. Support for multiple layers with merging capabilities of default and user-defined settings, Human readable, which is important for product support.
  5. Easy to define and use for application developers.

Default configuration and user defined configuration are requirement-concept division. To application, it need a merged result only, but never mind which layer the data comes. Getting this merged result can happen at application runtime, also can happen at compiling, packaging, or installing time if the configuration data is proved right. It is dynamic multilayer merging at runtime causes serious performance problem(see 2 performance bottleneck). We abandon the multilayer concept to get a balance of the requirement, security and performance. Actually, we can easily reach the requirement by using some simple data model and no need to import a multilayer concept to increase complication.

And the security problem is referred sometimes. We don't consider security as a serious problem here to go forward.

Performance Bottleneck

See the implementation of configmgr for now:

Config1.jpg

At application first run, xml parsers build an INode tree structure for a component module from schema(xcs), then merge the default layer into it and save it as binary format into user directory. From the second startup, application load the binary file to build the INode structure and merge the user layer into it. To adapt UNO API to enable hierarchical path access and support SET node, program builds an API Tree structure, which contains many TreeFragments, from the INode structure. When updating the modified configuration data to layers, program matches the data from memory cache first and commits it to a update list like structure, and later-write it to the user layer at some appropriate time.

Performance bottleneck:

Multilayer merging concept corresponding complicate data structure and algorithm are main the cause. Layer handling is in a Layer and Handler way. It means every operation of the layer needs a XML parsing. So,

I/O waste: One access of one configuration value in one component module needs load a xcs, default layer xcu, user layer xcu(Or default layer binary structure and user layer xcu). There are more than 40 logic configuration component modules. If application startup needs 20 component modules' configuration data, then 40 or 60 configuration files need load. It's a great I/O waste.

Merging: When access a configuration value, program need parse merge layer to a INode tree and then build it to Tree Fragment structure. When merging, parser get every node and search the match code in the former layer. Searching in tree is a time-consuming operation. Consider so many nodes in a layer, thus so many searching in the former layer tree structure. Although we can optimize tree search operation, but still too many of them. And the build operation from INode structure to Tree Fragment is still need much of memory allocation and operation time. And more over, merging and building make a lot of name string and when these objects passed as parameter their reference count increased and make a lot of locks. This is the reason of so many locks produced in configmgr and becoming a well-known hot spot.

Refactoring

Refactoring as below drawing shows:

Config2.jpg

No merging any more at runtime. There is only one xml data file to load and read at runtime. The new xml is defined format by us. And we provide a new tool to merge and transform the former xcs and xcu file to the new xml format during compiling or packaging in the concept of not affecting now developers and their configuring way. Thus, we have a simple configuration data access model now and avoid a complicate merging operation and data structure. We can access the data directly from xml file now. If we think of parsing and getting data from xml is not efficient enough, we can make a binary format also. This simple concept has some advantages:

Simplifier the access procedure and avoid redundancy of data and call.

Decrease file amount and reduce I/O spent.

Decrease data objects, so decrease locks and can solve part of memory fragment problem.

Easy adapt to UNOAPI.

Easy maintain and more simple configuration directories.

More understandable xml format.

Not any change to UNOAPI access of the configuration, so other application developers no need worry about inconvenience.

Files Deploy

Config3.jpg

The left drawing in above picture shows developing way of now. Developers configure their own configuration data in the scs and xcu file, also with the sdf in which provides the locale information. When compiling and packaging they use a tool named xsltproc to separate the xcu and sdf file into logic component modules' single files. So in the final product program home directory, there is a share directory containing all of these single configuration xml files and sub directories.

The right drawing shows the new developing way. Developers configure their data in the same way during developing. But when compiling and packaging, there will be a new tool instead of xsltproc to build and transform the xcs, xcu and locale sdf file to a unified xml file format. So in product program share directory there will be a more single directory hierarchy only containing configuration xml files of which one represents one component module. When application first running, program copy the unified xml files to user directory, and from that on, all access of configuration happen on xml files in user directory unless default configuration data changes.

Performance Evaluation

The simple access model reduces most of runtime merging and data transforming a lot, so decrease a lot of data objects and resource locks and call chains. Compare this to multilayer concept implementation, this obviously will achieve a considerable performance win. More over, lock hot spot will disappear. Note that after merged, the config data files decrease about 2/3, so this can save a lot I/O time especially at cold start.

Other

In the application modules such as writer developers cache the configuration data items. And in configmgr module itself, there is a cache also. So this is a redundancy in theory. And it is hard to notify a data change of configuration to all of these caches. Normalizing the developing way or how to use the configuration data is necessary.

Related Links

Configmgr_Refactoring/Design

Configmgr_Refactoring/test_result_2009_02_19

Personal tools