Difference between revisions of "DbConfig"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Futures)
 
(15 intermediate revisions by one other user not shown)
Line 33: Line 33:
 
*[[Image:Db_config.diff.gz|Experimental database modification]]
 
*[[Image:Db_config.diff.gz|Experimental database modification]]
  
== Futures ==
+
== DBBE ==
  
For these performance gains to be realized in any way, the backend must be crafted to be a correct production backend.  The approach outlined in the Devloper's Guide is to impliment either the service Backend or SingleBackend.  By looking at localbe, one can conclude that this not the whole truthThe localbe impliments several UNO Services:
+
We have crafted a new backend, called dbbe that uses xml files encapsulated in berkeley databasesThis backend is similar to localbe in many ways and has some code that is "stolen" from thereIt implements the following services:
* SingleBackend
+
* [http://api.openoffice.org/docs/common/ref/com/sun/star/configuration/backend/Layer.html Layer]
* HierarchyBrowser
+
* [http://api.openoffice.org/docs/common/ref/com/sun/star/configuration/backend/UpdatableLayer.html UpdateableLayer]
* Layer
+
* [http://api.openoffice.org/docs/common/ref/com/sun/star/configuration/backend/SingleLayerStratum.html SingleLayerStratum]
* CompositeLayer
+
* [http://api.openoffice.org/docs/common/ref/com/sun/star/configuration/backend/MultiLayerStratum.html MultiLayerStratum]
* SingleStratum
+
* MultiStratum
+
  
However, the approach we will take for the databse backend is slightly different. We will impliment only the following UNO serivices:
+
Additionally, we have a utility that can import and export xcu files into databases for migration.
* Layer
+
* CompsiteLayer
+
* SingleStratum
+
* MultiStratum
+
  
=== Layer/Composite Layer ===
+
=== Basic Design ===
  
We will define a schema for our Layers and UpdatableLayers that uses a namespace prefix concatinated with the Configuration Item as a key.  The namespace convention we will use is the same as localbe for it's three managed stratums: res, data, modules.  We use the :: seperator because (as far as I know), this is not valid in a path and it makes a nice convention.  Additionally, we will use another :: seperator after the Configuration Item name to signify a sublayer.  So for example, we would have the keys:
+
The berkeley database stores key/data pairs with no relations.  These pairs are of arbitrary type and size.  We've made an  abstraction of the berkeleydb API that can put and get Records by key.  Our Single/MultiLayerStratums list/get Layers that  deal with individual Records.
 +
 
 +
=== Layer/Updatable Layer ===
 +
 
 +
We define a schema for our [http://www.moonleib.org/dbbe/html/d2/d65/classconfigmgr_1_1dbbe_1_1BaseLayer.html Layers and UpdatableLayers] that uses a namespace prefix concatinated with the Configuration Item as a key.  The namespace convention we will use is the same as localbe for it's three managed stratums: res, data, modules.  We use the :: seperator because (as far as I know), this is not valid in a path and it makes a nice convention.  Additionally, we will use another :: seperator after the Configuration Item name to signify a sublayer.  So for example, we would have the keys:
 
* data::org.openoffice.Office.Labels
 
* data::org.openoffice.Office.Labels
 
* modules::org.openoffice.TypeDetection.Types.fcfg_math_types
 
* modules::org.openoffice.TypeDetection.Types.fcfg_math_types
  
We will use the berkelydb to allow multiple keys of the same name, allowing us to impliment the eqivalent of CompositeLocalFileLayer by using cursors to iterate over the individual Records.  Of course, this means that the individual names of such items will be lost:
+
For sublayers, the mapping would be as follows:
 
* /registry/modules/org/openoffice/Setup/Setup-draw.xcu
 
* /registry/modules/org/openoffice/Setup/Setup-draw.xcu
 
* /registry/modules/org/openoffice/Setup/Setup-writer.xcu
 
* /registry/modules/org/openoffice/Setup/Setup-writer.xcu
 
* /registry/modules/org/openoffice/Setup/Setup-calc.xcu
 
* /registry/modules/org/openoffice/Setup/Setup-calc.xcu
::all get mapped to:  
+
::get mapped to:  
* modules::org.openoffice.Setup
+
* modules::org.openoffice.Setup::Setup-draw
 +
* modules::org.openoffice.Setup::Setup-writer
 +
* modules::org.openoffice.Setup::Setup-calc
  
 
Note: localbe uses a strange system for language packs that we will not adopt.  So, in localbe, we have:
 
Note: localbe uses a strange system for language packs that we will not adopt.  So, in localbe, we have:
 
/res/en-US/org/openoffice/Office/UI/BasicIDECommands, where en-US is a sublayer.  In our backend, dbbe, we would have:
 
/res/en-US/org/openoffice/Office/UI/BasicIDECommands, where en-US is a sublayer.  In our backend, dbbe, we would have:
 
res::org.openoffice.Office.UI.BasicIDECommands::en-US, where en-US is a sublayer.
 
res::org.openoffice.Office.UI.BasicIDECommands::en-US, where en-US is a sublayer.
 +
 +
Note: the use of the word sublayer is somewhat misleading.  The [http://www.moonleib.org/dbbe/html/d8/d28/classconfigmgr_1_1dbbe_1_1Record.html Record] object does not differntiate sublayer blobs from anonymous configuration "particles"  It calls both sublayers.
  
 
The data can be though of as a struct defined as so:
 
The data can be though of as a struct defined as so:
Line 70: Line 72:
 
struct Record
 
struct Record
 
{
 
{
     sal_Int64 date;          /** Unix Epoch */
+
     sal_Int64 date;          /** Unix Epoch */
     sal_uInt32 blobSize;     /** XML Blob Size */
+
     sal_uInt32 blobSize;     /** XML Blob Size */
 
     sal_uInt32 numSubLayers;
 
     sal_uInt32 numSubLayers;
     db_recno_t *pSubLayers;
+
     sal_Char  *pSubLayers;   /** ARGV style vector */
 
     sal_Char  *pBlob;        /** XML Blob */
 
     sal_Char  *pBlob;        /** XML Blob */
 
};
 
};
 
</pre>
 
</pre>
  
Each layer will be implimented in a similar way to localbe, with a base class for the simple aspects of XLayer and XTimeStamped and a base class for the common elements of XCompositeLayer and XTimeStamped.  The readData method will be implimented like in the experiment, with comphelper creating a stream from a ByteSequence.  We will not use the readonly property URL, as it has no meaning.  The getTimeStamp method will return a concatination of the size and creation date, but in epoch format.
+
Each layer is implemented in a similar way to localbe, with a base class for the simple aspects of XLayer and XTimeStamped and a base class for the common elements of XCompositeLayer and XTimeStamped.  The readData method will be implimented like in the experiment, with comphelper creating a stream from a ByteSequence.  We use the URL property, but define the "URL" as <dbPath>:Key.
  
 
=== SingleStratum/MultiStratum ===
 
=== SingleStratum/MultiStratum ===
  
Like in localbe, we will use a base class to impliment XBackendEntities and then have a single MultiStratum class to provide XMultiStratum.  However, the approach in localbe for SingleBackend is a now somewhat archaic.  To refresh, in localbe:
+
Like in localbe, we use a [http://www.moonleib.org/dbbe/html/d6/d36/classconfigmgr_1_1dbbe_1_1BaseStratum.html base class] to impliment XBackendEntities and then have a single MultiStratum class to provide XMultiStratum.  However, the approach in localbe for SingleLayerStratum is a now somewhat archaic.  To refresh, in localbe:
 
* LocalStratumBase
 
* LocalStratumBase
 
** LocalSingleStratum Base
 
** LocalSingleStratum Base
Line 90: Line 92:
 
*** LocalSingleStratum
 
*** LocalSingleStratum
  
In our implimentation, we will have a simplier class heirchy with only two derivative classes for the SingleStratum service.   The requirements for SingleStratum are as such:
+
In our implimentation, we have a simplier class heirchy with only two derivative classes for the BaseStratume that impliment XMultiLayerStratum and XSingleLayerStratum. We use ask the database for a given layer if it has any "SubLayers" to see if getLayer and the like should return a CompositeLayer, Layer, or the updatable version of each.  We do this by checking if the Layer has any sublayers and if the database it came from is read-only.
# (modifiable, no sublayers) for user data
+
# (read-only, no sublayers) for core data
+
# (read-only, only sublayers) for localized data
+
  
We can re-use 1 to impliment 2, since we rely on the file-system permissions to tell us if we can write to core-data, and therefore can refuse to write to it.  3 will require it's own derivative class.
+
=== Factory/Database Abstraction ===
  
=== Factory ===
+
We have a class, [http://www.moonleib.org/dbbe/html/d1/dde/classconfigmgr_1_1dbbe_1_1Database.html Database], that abstracts the berkelyDB API from the rest of dbbe.  This is done to consolodate access to database internal settings so that they can all be changed in one place.
Because we have multiple stratum services that are using the same database (just namepsaces within them), we will provide a factory service that can instantiate databases.  This will allow the services specified in the configmgrc to specify a database path and namespace prefix to work in.  Additionally, this allows for the case of more than the original two envisioned databases to be used.  However, making more databases will degrade performance.
+
 
 +
Because we have multiple stratum services that are using the same database (just namepsaces within them), we provide a factory that can instantiate databases.  This allows the services specified in the configmgrc to specify a database path and namespace prefix to work in.  Additionally, this allows for the case of more than the original two envisioned databases to be used.  However, making more databases will degrade performance.
  
 
=== Import/Export ===
 
=== Import/Export ===
Because this is a fundamentally non-human-editable data store, we will provide a utility to populate and edit the contents of the database.  This utility will be a stand-alone binary suitable for packages to use in their post-install scripts.  We will support the following basic operations:
+
Because this is a fundamentally non-human-editable data store, we provide a utility to populate and edit the contents of the database.  This utility will be a stand-alone binary suitable for packages to use in their post-install scripts.  We will support the following basic operations:
* list contents
+
* import files
* "checkout" Items
+
* export files
* "checkin" Items
+
* integrity check
* delete Items
+
* statistics on database (list keys)
* import Items
+
 
 +
For import and export, we use a "code" to mangle/demangle key names into file paths/names for localbe.  The scheme that localbe uses is automatically detected, and a correct code is supplied.  However, for other cases, it is possible to supply a code for how names are mapped to keys.  The code is very simple; it just specifies what parts of the path are namespaces, layers, and "sublayers."  The Mangler class handles this name mangling/demangling and the respository class handles the import/export of keys.
 +
 
 +
=== Performance ===
 +
Just how fast is it?  Benchmarks indicate that it is up to 16% faster on cold-starts. 
 +
 
 +
The methodology of testing is as follows:  The test machine is a 3.2Ghz P4HT running OpenSuSE 10.0 with 1G of RAM and 120G of SATA attached storage.  OpenOffice is built with ooo-build.  OpenOffice is started by a script in the xession that sleeps for two minutes before launching with the option -norestore -<application>.  The system sleeps for two minutes and then reboots.
 +
[[Image:Dbbe-vs-localbe-starts.png|frame|Median Time for 100 Cold-Starts]]
 +
 
 +
=== The Code ===
 +
The code is in CWS: configdbbe
 +
 
 +
Also, you can see a doxygen run of dbbe [http://www.moonleib.org/dbbe/html/da/ddb/namespaceconfigmgr_1_1dbbe.html here]
 +
 
 +
[[Category:Performance]]

Latest revision as of 09:35, 24 February 2010

Motivation

In our initial investigations of cold-start performance, we theorized that the impact of having hundreds of small xml files to be crawled had a negative impact on performance. We reasoned that combining the hundreds of files into one or two containers would greatly improve the ability of the buffer-cache to work effectively and reduce the startup time.

Experiments

We performed several experiments to get an estimate of the performance gains that could be had. We tried three methodologies that all indicated that we could expect roughly a 1 second improvement on a cold-start of writer. Our test machine for these experiments was a 3.2Ghz Pentium 4HT with 1 gigabyte of RAM, a 120GB 7200rpm SATA drive, running NLD9.

Prestuff

In this experiment, we pre-stuffed the buffer-cache with the xcu files before startup. This was accomplished by running cat:

cat xcu_file_list | xargs cat | cat > /dev/null

Ramdisk

We created two ramdisks and put the configuration data in them. This was a little coarser, because the contents registry was the ramdisk, which also includes the cache.

Const Char*

We created a header file that had all of the contents of the xcu files as const character strings. A simple search function was crafted that took a URL and found the appropriate string. We changed the implimentation of XLayer.readData to look into the strings instead of the disk. Writes were left going to the disk.

Approaches

At first, we tried to modify localbe to use one large xml file rather than hundreds of smaller ones. This proved to be quite difficult, because the "canned" parser does not expect multiple components per file. Because of the expat interface, it is difficult to craft a system of querying the xml parser for relavent tags and sorting the data. Thus, it was decided that a database system would yeild the fastest implimentation rate.

BerkelyDB was chosen because it was used elsewhere in the code, and is very simple. A simple schema was implimented, where there the key was the url of the xcu file and the value was a struct with the timestamp and the xml blob. There was also a special key, the list of keys, so that searching for children could be done in a moderately efficient manner.

Prototype Changes to localbe

The changes were to BasicLocalFileLayer::readData and replaceWith. Additionally, the LocalMultiStratum::listLayerId had to be changed. These changes were quick and dirty, and in no way production quality. We left LocalSingleBackend and LocalHierarchyBrowserSvc alone, because they weren't used in simple uses of openoffice, and this was just an experiment. We also introduce Database, a singleton that encapsulates the two databases.

Results

Our performance testing methodology was to use a script that launched openoffice and had it output RTL_LOG data. After a short delay (roughly one minute), the system would be rebooted and the process would repeat. Our comparison is based on a build that is built with ooo-build. The version of ooo-build is ooo-build-2.0.0.2 and it is compiled with TIMELOG and -g. The tested machine for these tests is a IBM T42 Thinkpad, (1.7 Ghz Centrino) with 1 gigabyte of RAM and 40G 5400rpm drive running NLD9.

For starting writer, the startup gain was again roughly 1 second. Our average start time of the unmodified build is 10.589 seconds, and our modified version's average start time is 9.234 seconds with 5 samples. However, we cannot simply claim a 1.3 second improvement, because we have modified our startup script to prestuff the buffer-cache with the two database files. This takes 440 milliseconds, so our net gain is 915 milliseconds. While this is only 5 samples, a similar test that just started the framework/shell (soffice, with no arguments) of 50 runs showed a net gain of ~750 milliseconds. On a sample set of 20, starting calc is also accelerated by a net gain of 967 milliseconds.

It should be noted, however, that the results vary across machines, in particular the speedup for desktops is less noticeable.

DBBE

We have crafted a new backend, called dbbe that uses xml files encapsulated in berkeley databases. This backend is similar to localbe in many ways and has some code that is "stolen" from there. It implements the following services:

Additionally, we have a utility that can import and export xcu files into databases for migration.

Basic Design

The berkeley database stores key/data pairs with no relations. These pairs are of arbitrary type and size. We've made an abstraction of the berkeleydb API that can put and get Records by key. Our Single/MultiLayerStratums list/get Layers that deal with individual Records.

Layer/Updatable Layer

We define a schema for our Layers and UpdatableLayers that uses a namespace prefix concatinated with the Configuration Item as a key. The namespace convention we will use is the same as localbe for it's three managed stratums: res, data, modules. We use the :: seperator because (as far as I know), this is not valid in a path and it makes a nice convention. Additionally, we will use another :: seperator after the Configuration Item name to signify a sublayer. So for example, we would have the keys:

  • data::org.openoffice.Office.Labels
  • modules::org.openoffice.TypeDetection.Types.fcfg_math_types

For sublayers, the mapping would be as follows:

  • /registry/modules/org/openoffice/Setup/Setup-draw.xcu
  • /registry/modules/org/openoffice/Setup/Setup-writer.xcu
  • /registry/modules/org/openoffice/Setup/Setup-calc.xcu
get mapped to:
  • modules::org.openoffice.Setup::Setup-draw
  • modules::org.openoffice.Setup::Setup-writer
  • modules::org.openoffice.Setup::Setup-calc

Note: localbe uses a strange system for language packs that we will not adopt. So, in localbe, we have: /res/en-US/org/openoffice/Office/UI/BasicIDECommands, where en-US is a sublayer. In our backend, dbbe, we would have: res::org.openoffice.Office.UI.BasicIDECommands::en-US, where en-US is a sublayer.

Note: the use of the word sublayer is somewhat misleading. The Record object does not differntiate sublayer blobs from anonymous configuration "particles" It calls both sublayers.

The data can be though of as a struct defined as so:

struct Record
{
    sal_Int64  date;          /** Unix Epoch */
    sal_uInt32 blobSize;      /** XML Blob Size */
    sal_uInt32 numSubLayers;
    sal_Char   *pSubLayers;   /** ARGV style vector */
    sal_Char   *pBlob;        /** XML Blob */
};

Each layer is implemented in a similar way to localbe, with a base class for the simple aspects of XLayer and XTimeStamped and a base class for the common elements of XCompositeLayer and XTimeStamped. The readData method will be implimented like in the experiment, with comphelper creating a stream from a ByteSequence. We use the URL property, but define the "URL" as <dbPath>:Key.

SingleStratum/MultiStratum

Like in localbe, we use a base class to impliment XBackendEntities and then have a single MultiStratum class to provide XMultiStratum. However, the approach in localbe for SingleLayerStratum is a now somewhat archaic. To refresh, in localbe:

  • LocalStratumBase
    • LocalSingleStratum Base
      • LocalDataStratum
      • LocalReadOnlyStratum
      • LocalResourceStratum
      • LocalSingleStratum

In our implimentation, we have a simplier class heirchy with only two derivative classes for the BaseStratume that impliment XMultiLayerStratum and XSingleLayerStratum. We use ask the database for a given layer if it has any "SubLayers" to see if getLayer and the like should return a CompositeLayer, Layer, or the updatable version of each. We do this by checking if the Layer has any sublayers and if the database it came from is read-only.

Factory/Database Abstraction

We have a class, Database, that abstracts the berkelyDB API from the rest of dbbe. This is done to consolodate access to database internal settings so that they can all be changed in one place.

Because we have multiple stratum services that are using the same database (just namepsaces within them), we provide a factory that can instantiate databases. This allows the services specified in the configmgrc to specify a database path and namespace prefix to work in. Additionally, this allows for the case of more than the original two envisioned databases to be used. However, making more databases will degrade performance.

Import/Export

Because this is a fundamentally non-human-editable data store, we provide a utility to populate and edit the contents of the database. This utility will be a stand-alone binary suitable for packages to use in their post-install scripts. We will support the following basic operations:

  • import files
  • export files
  • integrity check
  • statistics on database (list keys)

For import and export, we use a "code" to mangle/demangle key names into file paths/names for localbe. The scheme that localbe uses is automatically detected, and a correct code is supplied. However, for other cases, it is possible to supply a code for how names are mapped to keys. The code is very simple; it just specifies what parts of the path are namespaces, layers, and "sublayers." The Mangler class handles this name mangling/demangling and the respository class handles the import/export of keys.

Performance

Just how fast is it? Benchmarks indicate that it is up to 16% faster on cold-starts.

The methodology of testing is as follows: The test machine is a 3.2Ghz P4HT running OpenSuSE 10.0 with 1G of RAM and 120G of SATA attached storage. OpenOffice is built with ooo-build. OpenOffice is started by a script in the xession that sleeps for two minutes before launching with the option -norestore -<application>. The system sleeps for two minutes and then reboots.

Median Time for 100 Cold-Starts

The Code

The code is in CWS: configdbbe

Also, you can see a doxygen run of dbbe here

Personal tools