From Apache OpenOffice Wiki
Jump to: navigation, search

Git is a popular version control system designed to handle very large projects with speed and efficiency. See for more info.

The Windows users might be interested in the MinGW git port (binaries).

Git and

A functional git tree with the entire OOo history for testing purposes is here: It is an imported CVS tree that was split into two parts:

  • The sources themselves - ooo.git
  • The 3rd party stuff (binary mozilla, zlib, berkeleydb, ...) - 3rdparty.git

The size of the sources is about 1.3G, the size of the 3rd party stuff is 591M. Please follow the instructions on to get the tree.

For testing purposes, even a git tree without history is available as git:// It is a full import of src680-m211 (with the 3rdparty libraries, localizations, etc.) The plan is to start the OOo git tree as a tree without history with the possibility to 'graft' the history into this (message, sample script).


These transformations are done while converting from CVS:

  • The OOo repository is split into the sources and 3rd party sources as described above
  • 'cws_src680_xyz' branches are renamed to simple 'xyz'
  • 'CWS_SRC680_XYZ_ANCHOR' tags are renamed to simple 'XYZ'
  • 'INTEGRATION: CWS xyz' commits are grouped into one commit (they are generated by CWS tooling per-file), and treated as a merge in the git tree
  • Tabs are converted to 4 spaces at the beginning of the lines in .c/.cxx/.h/.hxx/.mk/.src
  • 'RESYNC:.*FILE MERGED', and 'RESYNC:.*FILE REMOVED' are grouped inside branches (with single 'RESYNC' log entry)
    • May result in multiple 'RESYNC' commits inside the branch when a commit happened to another one in the middle of the resync

After creating the tree, it is worth repacking, like

git repack -a -f --depth=50 --window=250

If it's going out of memory, one can limit it:

git config pack.deltaCacheLimit 1
git config pack.deltaCacheSize 1
git config pack.windowMemory 4g


  • Convert CollabNet account names into real names
    • maybe use the data from DomainDeveloper (complete that where necessary) if there's no easy way to extract the names from CollabNet
  • Delete merged branches (from 'heads', not from history!)
  • Provide the too old history as 'graft' - see eg.;f=contrib/
  • Translations to a separate git tree as well?
  • URE to a separate git tree?
  • ODF Toolkit to a separate git tree?
  • .pdf version of developer's guide consume quite some space as well - any chance to do something with it?



Links to Git comparison with other SCMs:

Comparison of git with Subversion:

Machines used for the testing

CVS tests:

  •  ???

Git tests [let's call this one 'git machine' ;-)]:

  • CPU: AMD Athlon(tm) 64 Processor 3200+
  • RAM: 1G
  • Disk (info from bonnie):
              ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
one    1*2000 37819 77.6 44296 16.8 16982  5.1 35203 63.9 45915  6.6  152.4  0.4
  • OS: SUSE 10.1
  • Filesystem: ext3
  • Net connection: ~20Mbit

SVN tests:

  •  ???


The git repository could [should! ;-)] be tuned for better results:

  • Delete integrated branches - the history will be still preserved, just the number of open heads will reduce (by about 3000)
  • Graft history - the old development can be 'hidden' and available just to those who really need it using a simple script, like;f=contrib/ . This way we can save about 1G of download!

The Results

What CVS git SVN
Size of data on the server [OOo sources] 8.5G 1.3G
Size of data on the server [3rd party] 1.1G 591M
Size of checkout [OOo sources] 1.4G 2.8G [files you can hack on (contains localize.sdf's) + the history] 3.3G [files you can hack on + localize.sdf's from data-trunk + .svn directories]
Size of checkout [3rd party] 98M 688M [files you can hack on + the history] 199M [files you can hack on + .svn directories]
Initial checkout time [OOo sources] 117 minutes (Linux, 2MBit DSL), 26 minutes (Linux, 2MBit DSL, with compression (-z 6) 130 minutes, (51 min for a pull) (Linux, 2MBit DSL) [from]

100min (Linux, 2MBit DSL, Wireless, no proxy) [from] (1586669 objects (counting, deltifying, indexing) 1144663 deltas to resolve)
44min (faster machine than the [git machine], but with the same connection) [from]

60 minutes (Windows, 34Mbit Line)

58 min [git machine]

Initial checkout time [3rd party]
Branch creation Immediately Immediately with local svn server, 25 sec with server
Branch switch <15sec [to newly created], 3min to an old one 12min 40sec [git machine] ??
Diff Immediately 4min 13sec [git machine]
Commit 13-25sec
Merge 10sec [new branch with few changes], <3min [long living branch, harder scenario]
Resync Same as 'Merge' - it's a merge from 'master' to the branch.
Integration Same as 'Merge' - it's a merge from a branch to the 'master'.
Push Not necessary push back one branch in local network: 9 sec, push back repository 40 min Not necessary

'3rd party' in this context means the following modules: agg, beanshell, berkeleydb, bitstream_vera_fonts, boost, curl, dictionaries, epm, expat, freetype, hsqldb, icu, jpeg, libwpd, libxml2, moz, msfontextract, nas, neon, np_sdk, portaudio, python, sablot, sane, sndfile, stlport, vigra, xalan, xt, zlib.

Commands used for the tests:

What CVS git SVN
checkout [OOo sources] cvs co OpenOffice2 git clone git:// (How does this work with a proxy) svn checkout svn

(This tree does not contain localize.sdf's, they are in trunk-data.)

Branch creation [all the following commands were issued in the subdir]

git branch test

[all the following commands were issued in the svn subdir]
Branch switch git checkout test svn switch
Diff vim vcl/unx/kde/salnativewidgets-kde.cxx [to do some changes] ; git diff vim vcl/unx/kde/salnativewidgets-kde.cxx [to do some changes] ; svn diff
Commit [with the changes from 'Diff']

git commit -a

Merge [the simple scenario] git branch test2 ; git checkout test2 ; vim vcl/unx/kde/salnativewidgets-kde.cxx [another changes] ; git commit -a ; git checkout test [preparation to have something to merge]

git pull . test2 [the merge itself]

Merge [the harder scenario] git pull git:// unxsplash

[an old CWS of mine - called cws_src680_unxsplash in the CVS]

Resync [it's usually not necessary to do resynces with git; but when needed to get a feature a branch would depend on, it's just a merge from remote 'master']
Integration git checkout master ; git pull . test

[or alternatively: git checkout master ; git merge 'merging test into master' master test]

Personal tools