Difference between revisions of "DbConfig"

Revision as of 06:15, 6 February 2006

Motivation

In our initial investigations of cold-start performance, we theorized that the impact of having hundreds of small xml files to be crawled had a negative impact on performance. We reasoned that combining the hundreds of files into one or two containers would greatly improve the ability of the buffer-cache to work effectively and reduce the startup time.

Experiments

We performed several experiments to get an estimate of the performance gains that could be had. We tried three methodologies that all indicated that we could expect roughly a 1 second improvement on a cold-start of writer. Our test machine for these experiments was a 3.2Ghz Pentium 4HT with 1 gigabyte of RAM, a 120GB 7200rpm SATA drive, running NLD9.

Prestuff

In this experiment, we pre-stuffed the buffer-cache with the xcu files before startup. This was accomplished by running cat:

cat xcu_file_list | xargs cat | cat > /dev/null

Ramdisk

We created two ramdisks and put the configuration data in them. This was a little coarser, because the contents registry was the ramdisk, which also includes the cache.

Const Char*

We created a header file that had all of the contents of the xcu files as const character strings. A simple search function was crafted that took a URL and found the appropriate string. We changed the implimentation of XLayer.readData to look into the strings instead of the disk. Writes were left going to the disk.

Leveraged Backend

At first, we tried to modify localbe to use one large xml file rather than hundreds of smaller ones. This proved to be quite difficult, because the "canned" parser does not expect multiple components per file. Because of the expat interface, it is difficult to craft a system of querying the xml parser for relavent tags and sorting the data. Thus, it was decided that a database system would yeild the fastest implimentation rate.

BerkelyDB was chosen because it was used elsewhere in the code, and is very simple. A simple schema was implimented, where there the key was the url of the xcu file and the value was a struct with the timestamp and the xml blob. There was also a special key, the list of keys, so that searching for children could be done in a moderately efficient manner.

The changes were to XLayer, readData and replaceWith. Additionally, the *Stratum classes had to be changed. These changes were quick and dirty, and in no way production quality. We left LocalSingleBackend and LocalHierarchyBrowserSvc alone, because they weren't used in simple uses of openoffice, and this was just an experiment. We introduce a new class, Database which is a simple singleton wrapper of two databases, one for the user configuration data, and one for the system level configuration.

Our performance testing methodology was to use a script that launched openoffice and had it output RTL_LOG data. After a short delay (roughly one minute), the system would be rebooted and the process would repeat. The results were mixed-- we have had trouble reproducing the results on all machines. For starting writer, the startup time was again roughly 1 second. The tested machine for these tests is a IBM T42 Thinkpad, (1.7 Ghz Centrino) with 1 gigabyte of RAM and 40G 5400rpm drive running NLD9. Our comparison is based on a build that is built with ooo-build. The milestone is insert detail here and it is compiled with TIMELOG and -g. Our average start time of vanilla is 10.589 seconds, and our modified version's average start time is 9.234 seconds with 5 samples. However, we cannot simply claim a 1.3 second improvement, because we have modified our startup script to prestuff the buffer-cache with the two database files. This takes 440 milliseconds, so our net gain is 915 milliseconds. While this is only 5 samples, a similar test that just started the framework/shell (soffice, with no arguments) of 50 runs showed a net gain of ~750 milliseconds.

Difference between revisions of "DbConfig"

Revision as of 06:15, 6 February 2006

Contents

Motivation

Experiments

Prestuff

Ramdisk

Const Char*

Leveraged Backend

Futures

Views

Personal tools

Navigation

Search

Tools

@@ Line 20: / Line 20: @@
 BerkelyDB was chosen because it was used elsewhere in the code, and is very simple.  A simple schema was implimented, where there the key was the url of the xcu file and the value was a struct with the timestamp and the xml blob.  There was also a special key, the list of keys, so that searching for children could be done in a moderately efficient manner.
-The changes were to XLayer, readData and replaceWith.  Additionally, the *Stratum classes had to be changed.  These changes were quick and dirty, and in no way production quality.  We left LocalSingleBackend and LocalHierarchyBrowserSvc alone, because they weren't used in simple uses of openoffice, and this was just an experiment.
+The changes were to XLayer, readData and replaceWith.  Additionally, the *Stratum classes had to be changed.  These changes were quick and dirty, and in no way production quality.  We left LocalSingleBackend and LocalHierarchyBrowserSvc alone, because they weren't used in simple uses of openoffice, and this was just an experiment.  We introduce a new class, Database which is a simple singleton wrapper of two databases, one for the user configuration data, and one for the system level configuration.
 Our performance testing methodology was to use a script that launched openoffice and had it output RTL_LOG data.  After a short delay (roughly one minute), the system would be rebooted and the process would repeat.  The results were mixed-- we have had trouble reproducing the results on all machines.  For starting writer, the startup time was again roughly 1 second.  The tested machine for these tests is a IBM T42 Thinkpad, (1.7 Ghz Centrino) with 1 gigabyte of RAM and 40G 5400rpm drive running NLD9.  Our comparison is based on a build that is built with ooo-build.  The milestone is ''insert detail here'' and it is compiled with TIMELOG and -g.  Our average start time of vanilla is 10.589 seconds, and our modified version's average start time is 9.234 seconds with 5 samples.  However, we cannot simply claim a 1.3 second improvement, because we have modified our startup script to prestuff the buffer-cache with the two database files.  This takes 440 milliseconds, so our net gain is 915 milliseconds.  While this is only 5 samples, a similar test that just started the framework/shell (soffice, with no arguments) of 50 runs showed a net gain of ~750 milliseconds.