SCM Migration
Glossary
The exact meaning of two terms is essential for the following migration guide:
- project: a top level project, with a project lead and a separate space on the OpenOffice.org web site and in the OpenOffice.org repository. Example: gsl (the project which hosts the vcl code module, rsc the resource compiler and 16 other code modules), zh (this project hosts the Chinese language community).
- module: the next level of structure is called module. Code projects typically host several modules, language projects usually have only a www module. Attention: Some modules are called like their hosting project, for example sw is also a module in the project sw.
Repository restructuring
Whether we take subversion as new SCM tool or a distributed SCM like git or mercurial, the necessary migration is also a good opportunity to restructure our repository and to do some badly needly clean up.
This restructure and migration guide is geared towards a migration to subversion, but the same principles can and should be applied to a potential migration to another SCM tool.
Currently we have 136 top level projects. Inside these projects we have varying numbers of modules, either web content modules or code modules. Many projects are dedicated to the OpenOffice.org language communities which are essentially independent from each other. Modules from code projects on the other hand are highly dependent from each other. We got about 265 of these code modules.
The idea is to move all modules containing OOo source code into a single repository. After that, each project get it's own repository, which is mostly for web content. After the migration, modules inside the new "code project" get linked into their original projects, to maintain the integrity of these projects.
Clean up
In 6 years we accumulated a lot of cruft in the CVS repository. We take the opportunity and skip some dead ends from the migration. The rule is, that every released version of OOo must be represented in the new repository. Otherwise we are pretty free to define what we want to migrate and what not. Currently I plan to implement the following strategy:
- migrate all releases of OOo to the new SCM, this means release tags and branches must be preserved
- skip experimental branches and tags if they can be proven to be obsolete
- skip obvious dead parts of the repository
- skip tags and branches of all CWS with status integrated, finished, deleted or canceled at a certain date (currently the date is 2007/05/15)
The last rule reduces the number of the to be migrated branches from about 5000 to about 500.
Recipe for migrating the code repository
For a migration to subversion a fast Unix machine with cvs, subversion-1.4 and the cvs2svn python script installed is needed.
Copy and restructure the CVS code repository
Create a copy of the CVS repository. In the following <work> is the directory which contains the 136 OOo top level projects. Use the repositorystructure.sh script to restructure the repository and to remove obsolete and broken stuff.
$ cd <work> $ sh repositorystructure.sh