FR/Documentation/HSQLDB Guide/ch05

From Apache OpenOffice Wiki
Jump to: navigation, search

Chapitre 5. Problèmes liés au déploiement

(Deployment Issues)

Fred Toussi

HSQLDB Development Group

<ft@cluedup.com>

Copyright 2005 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

$Date: 2005/07/02 09:11:39 $


But de ce document

Beaucoup de questions fréquemment posées dans les forums et mailing-lists trouvent leurs réponses dans ce guide. Si vous voulez utiliser HSQLDB avec votre application, vous devez lire ce guide. Ce document couvre les problèmes relatifs au système. Pour les problèmes relatifs au SQL voyez le chapitre : Problèmes liés au SQL.


Mode opératoire et tables

(Mode of Operation and Tables)

HSQLDB possède des modes opératoires et des fonctionnalités qui lui permette d'être utilisé dans des scénarios très différents. Les niveaux de l'utilisation de la mémoire, la vitesse et l'accessibilité sont influencés par la manière dont HSQLDB est déployé.

Mode opératoire

(Mode of Operation)

La décision d'exécuter HSQLDB comme un processus de serveur à part ou comme une base de données "in-process" doit être basée sur les faits suivants

  • Quand HSQLDB est exécuté comme un serveur sur une machine séparée, il est isolé des pannes matérielles et plantages de l'hôte qui exécute l'application.
  • Quand HSQLDB est exécuté comme un serveur sur la même machine, il est isolé des plantages de l'application et des fuites (leaks) de mémoire.
  • Les connexions au serveur sont plus lentes que les connexions "in-process" à cause de la surcharge du flux de données pour chaque appel JDBC.

Tables

Les tables texte sont désignées pour des applications spéciales où les données doivent être dans un format interchangeable, tel le format CSV. Les tables texte ne devraient pas être utilisées pour un stockage routinier des données.

Les tables mémoire et en cache sont généralement utilisées pour le stockage des données. Les différences entre ces deux types sont énumérées ci-dessous :

  • Pour les tables mémoire, les données sont lues depuis le fichier .script quand la base de données est démarrée et chargée en mémoire. Au contraire des tables en cache pour lesquelles les données ne sont chargées en mémoire qu'après un accès à la table. De plus pour les tables en cache seulement une partie des données est chargée en mémoire, ce qui permet aux tables de dépasser la mémoire totale allouée.
  • Quand la base de données est normalement fermée, toutes les données des tables mémoire sont écrites sur disque. Par comparaison, seules les données modifiées seront écrites dans les tables en cache à la fermeture, plus une sauvegarde compressée de l'intégralité des données pour toutes les tables en cache.
  • La taille et la capacité du cache des données est configurable pour toutes les tables en cache. Ceci permet d'allouer au cache mémoire toutes les données des tables en cache. Dans ce cas, la vitesse d'accès est bonne, bien que légèrement plus lente qu'avec les tables mémoire.
  • Pour la plupart des applications il est recommandé d'utiliser les tables mémoire pour les petites quantités de données, en réservant les tables en cache pour les grands jeux de données. Pour certaines applications spéciales dans laquelle la vitesse joue un rôle de première importance et si une grande quantité de mémoire libre est disponible, on peut utiliser aussi bien les tables mémoire pour de grandes tables.

Les grands objets

(Large Objects)

JDBC Clobs are supported as columns of the type LONGVARCHAR. JDBC Blobs are supported as columns of the type LONGVARBINARY. When large objects (LONGVARCHAR, LONGVARBINARY, OBJECT) are stored with table definitions that contain several normal fields, it is better to use two tables instead. The first table to contain the normal fields and the second table to contain the large object plus an identity field. Using this method has two benefits. (a) The first table can usually be created as a MEMORY table while only the second table is a CACHED table. (b) The large objects can be retrieved individually using their identity, instead of getting loaded into memory for finding the rows during query processing. An example of two tables and a select query that exploits the separation between the two follows:

CREATE MEMORY TABLE MAINTABLE(MAINID INTEGER, ......);

CREATE CACHED TABLE LOBTABLE(LOBID INTEGER, LOBDATA LONGVARBINARY);

SELECT * FROM (SELECT * FROM MAINTABLE <join any other table> WHERE <various conditions apply>) JOIN LOBTABLE ON MAINID=LOBID;

The inner SELECT finds the required rows without reference to the LOBTABLE and when it has found all the rows, retrieves the required large objects from the LOBTABLE. Deployment context

The files used for storing HSQLDB database data are all in the same directory. New files are always created and deleted by the database engine. Two simple principles must be observed:

   *
     The Java process running HSQLDB must have full privileges on the directory where the files are stored. This include create and delete privileges.
   *
     The file system must have enough spare room both for the 'permanent' and 'temporary' files. The default maximum size of the .log file is 200MB. The .data file can grow to up to 8GB. The .backup file can be up to 50% of the .data file. The temporary file created at the time of a SHUTDOWN COMPACT can be equal in size to the .data file.

Memory and Disk Use

Memory used by the program can be thought of as two distinct pools: memory used for table data, and memory used for building result sets and other internal operations. In addition, when transactions are used, memory is utilised for storing the information needed for a rollback.

Since version 1.7.1, memory use has been significantly reduced compared to previous versions. The memory used for a MEMORY table is the sum of memory used by each row. Each MEMORY table row is a Java object that has 2 int or reference variables. It contains an array of objects for the fields in the row. Each field is an object such as Integer, Long, String, etc. In addition each index on the table adds a node object to the row. Each node object has 6 int or reference variables. As a result, a table with just one column of type INTEGER will have four objects per row, with a total of 10 variables of 4 bytes each - currently taking up 80 bytes per row. Beyond this, each extra column in the table adds at least a few bytes to the size of each row.

The memory used for a result set row has fewer overheads (fewer variables and no index nodes) but still uses a lot of memory. All the rows in the result set are built in memory, so very large result sets may not be possible. In server mode databases, the result set memory is released from the server once the database server has returned the result set. In-process databases release the memory when the application program releases the java.sql.ResultSet object. Server modes require additional memory for returning result sets, as they convert the full result set into an array of bytes which is then transmitted to the client.

When UPDATE and DELETE queries are performed on CACHED tables, the full set of rows that are affected, including those affected due to ON UPDATE actions, is held in memory for the duration of the operation. This means it may not be possible to perform deletes or updates involving very large numbers of rows of CACHED tables. Such operations should be performed in smaller sets.

When transactions support is enabled with SET AUTOCOMMIT OFF, lists of all insert, delete or update operations are stored in memory so that they can be undone when ROLLBACK is issued. Transactions that span hundreds of modification to data will take up a lot of memory until the next COMMIT or ROLLBACK clears the list.

Most JVM implementations allocate up to a maximum amount of memory (usually 64 MB by default). This amount is generally not adequate when large memory tables are used, or when the average size of rows in cached tables is larger than a few hundred bytes. The maximum amount of allocated memory can be set on the java ... command line that is used for running HSQLDB. For example, with Sun JVM version 1.3.0 the parameter -Xmx256m increases the amount to 256 MB.

1.8.0 uses a fast cache for immutable objects such as Integer or String that are stored in the database. In most circumstances, this reduces the memory footprint still further as fewer copies of the most frequently-used objects are kept in memory. Cache Memory Allocation

With CACHED tables, the data is stored on disk and only up to a maximum number of rows are held in memory at any time. The default is up to 3*16384 rows. The hsqldb.cache_scale database property can be set to alter this amount. As any random subset of the rows in any of the CACHED tables can be held in the cache, the amount of memory needed by cached rows can reach the sum of the rows containing the largest field data. For example if a table with 100,000 rows contains 40,000 rows with 1,000 bytes of data in each row and 60,000 rows with 100 bytes in each, the cache can grow to contain nearly 50,000 rows, including all the 40,000 larger rows.

An additional property, hsqldb.cache_size_scale can be used in conjunction with the hsqldb.cache_scale property. This puts a limit in bytes on the total size of rows that are cached. When the default values is used for both properties, the limit on the total size of rows is approximately 50MB. (This is the size of binary images of the rows and indexes. It translates to more actual memory, typically 2-4 times, used for the cache because the data is represented by Java objects.)

If memory is limited, the hsqldb.cache_scale or hsqldb.cache_size_scale database properties can be reduced. In the example above, if the hsqldb.cache_size_scale is reduced from 10 to 8, then the total binary size limit is reduced from 50MB to 12.5 MB. This will allow the number of cached rows to reach 50,000 small rows, but only 12,500 of the larger rows. Managing Database Connections

In all running modes (server or in-process) multiple connections to the database engine are supported. In-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients.

Connection pooling software can be used to connect to the database but it is not generally necessary. With other database engines, connection pools are used for reasons that may not apply to HSQLDB.

   *
     To allow new queries to be performed while a time-consuming query is being performed in the background. This is not possible with HSQLDB 1.8.0 as it blocks while performing the first query and deals with the next query once it has finished it. This capability is under development and will be introduced in a future version.
   *
     To limit the maximum number of simultaneous connections to the database for performance reasons. With HSQLDB this can be useful only if your application is designed in a way that opens and closes connections for each small task.
   *
     To control transactions in a multi-threaded application. This can be useful with HSQLDB as well. For example, in a web application, a transaction may involve some processing between the queries or user action across web pages. A separate connection should be used for each HTTP session so that the work can be committed when completed or rolled back otherwise. Although this usage cannot be applied to most other database engines, HSQLDB is perfectly capable of handling over 100 simultaneous HTTP sessions as individual JDBC connections.

An application that is not both multi-threaded and transactional, such as an application for recording user login and logout actions, does not need more than one connection. The connection can stay open indefinitely and reopened only when it is dropped due to network problems.

When using an in-process database with versions prior to 1.7.2 the application program had to keep at least one connection to the database open, otherwise the database would have been closed and further attempts to create connections could fail. This is not necessary since 1.7.2, which does not automatically close an in-process database that is opened by establishing a connection. An explicit SHUTDOWN command, with or without an argument, is required to close the database. In version 1.8.0 a connection property can be used to revert to the old behaviour.

When using a server database (and to some extent, an in-process database), care must be taken to avoid creating and dropping JDBC Connections too frequently. Failure to observe this will result in unsuccessful connection attempts when the application is under heavy load. Upgrading Databases

Any database not produced with the release version of HSQLDB 1.8.0 must be upgraded to this version. This includes databases created with the RC versions of 1.8.0. The instructions under the Upgrading Using the SCRIPT Command section should be followed in all cases.

Once a database is upgraded to 1.8.0, it can no longer be used with Hypersonic or previous versions of HSQLDB.

There may be some potential legacy issues in the upgrade which should be resolved by editing the .script file:

   *
     Version 1.8.0 does not accept duplicate names for indexes that were allowed before 1.7.2.
   *
     Version 1.8.0 does not accept duplicate names for table columns that were allowed before 1.7.0.
   *
     Version 1.8.0 does not create the same type of index for foreign keys as versions before 1.7.2.
   *
     Version 1.8.0 does not accept table or column names that are SQL identifiers without double quoting.

Upgrading Using the SCRIPT Command

To upgrade from 1.7.2 or 1.7.3 to 1.8.0, simply issue the SET SCRIPTFORMAT TEXT and SHUTDOWN SCRIPT commands with the old version, then open with the new version of the engine. The upgrade is then complete.

To upgrade from older version database files (1.7.1 and older) that do not contain CACHED tables, simple SHUTDOWN with the older version and open with the new version. If there is any error in the .script file, try again after editing the .script file.

To upgrade from older version database files (1.7.1 and older) that contain CACHED tables, use the SCRIPT procedure below. In all versions of HSQLDB and Hypersonic 1.43, the SCRIPT 'filename' command (used as an SQL query) allows you to save a full record of your database, including database object definitions and data, to a file of your choice. You can export a script file using the old version of the database engine and open the script as a database with 1.8.0.

Procedure 5.1. Upgrade Using SCRIPT procedure

  1.
     Open the original database in the old version of DatabaseManager
  2.
     Issue the SCRIPT command, for example SCRIPT 'newversion.script' to create a script file containing a copy of the database.
  3.
     Use the 1.8.0 version of DatabaseManager to create a new database, in this example 'newversion' in a different directory.
  4.
     SHUTDOWN this database.
  5.
     Copy the newversion.script file from step 2 over the file of the same name for the new database created in 4.
  6.
     Try to open the new database using DatabaseManager.
  7.
     If there is any inconsistency in the data, the script line number is reported on the console and the opening process is aborted. Edit and correct any problems in the newversion.script before attempting to open again. Use the guidelines in the next section (Manual Changes to the .script File). Use a programming editor that is capable of handling very large files and does not wrap long lines of text.

Manual Changes to the .script File

In 1.8.0 the full range of ALTER TABLE commands is available to change the data structures and their names. However, if an old database cannot be opened due to data inconsistencies, or the use of index or column names that are not compatible with 1.8.0, manual editing of the SCRIPT file can be performed.

The following changes can be applied so long as they do not affect the integrity of existing data.

   *
     Names of tables, columns and indexes can be changed.
   *
     CREATE UNIQUE INDEX ... to CREATE INDEX ... and vice versa
     A unique index can always be converted into a normal index. A non-unique index can only be converted into a unique index if the table data for the column(s) is unique in each row.
   *
     NOT NULL
     A not-null constraint can always be removed. It can only be added if the table data for the column has no null values.
   *
     PRIMARY KEY
     A primary key constraint can be removed or added. It cannot be removed if there is a foreign key referencing the column(s).
   *
     COLUMN TYPES
     Some changes to column types are possible. For example an INTEGER column can be changed to BIGINT, or DATE, TIME and TIMESTAMP columns can be changed to VARCHAR.

After completing the changes and saving the modified *.script file, you can open the database as normal. Backing Up Databases

The data for each database consists of up to 5 files in the same directory. The endings are *.properties, *.script, *.data, *.backup and *.log (a file with the *.lck ending is used for controlling access to the database and should not be backed up). These should be backed up together. The files can be backed up while the engine is running but care should be taken that a CHECKPOINT or SHUTDOWN operation does not take place during the backup. It is more efficient to perform the backup immediately after a CHECKPOINT. The *.data file can be excluded from the backup. In this case, when restoring, a dummy *.data file is needed which can be an empty, 0 length file. The engine will expand the *.backup file to replace this dummy file if the backup is restored. If the *.data file is not backed up, the *.properties file may have to be modified to ensure it contain modified=yes instead of modified=no prior to restoration. If a backup immediately follows a checkpoint, then the *.log file can also be excluded, reducing the significant files to *.properties, *.script and *.backup. Normal backup methods, such as archiving the files in a compressed bundle can be used.

Personal tools