Git is a popular version control system designed to handle very large projects with speed and efficiency. See http://git.or.cz/ for more info.
Git and OpenOffice.org
A functional git tree with the entire OOo history for testing purposes is here: http://go-oo.org/git. It is an imported CVS tree that was split into two parts:
- The sources themselves - ooo.git
- The 3rd party stuff (binary mozilla, zlib, berkeleydb, ...) - 3rdparty.git
The size of the sources is about 1.3G, the size of the 3rd party stuff is 591M. Please follow the instructions on http://go-oo.org/git to get the tree.
For testing purposes, even a git tree without history is available as git://go-oo.org/git/without-history/src680-m211.git. It is a full import of src680-m211 (with the 3rdparty libraries, localizations, etc.) The plan is to start the OOo git tree as a tree without history with the possibility to 'graft' the history into this (message, sample script).
These transformations are done while converting from CVS:
- The OOo repository is split into the sources and 3rd party sources as described above
- 'cws_src680_xyz' branches are renamed to simple 'xyz'
- 'CWS_SRC680_XYZ_ANCHOR' tags are renamed to simple 'XYZ'
- 'INTEGRATION: CWS xyz' commits are grouped into one commit (they are generated by CWS tooling per-file), and treated as a merge in the git tree
- Tabs are converted to 4 spaces at the beginning of the lines in .c/.cxx/.h/.hxx/.mk/.src
- 'RESYNC:.*FILE MERGED', and 'RESYNC:.*FILE REMOVED' are grouped inside branches (with single 'RESYNC' log entry)
- May result in multiple 'RESYNC' commits inside the branch when a commit happened to another one in the middle of the resync
After creating the tree, it is worth repacking, like
git repack -a -f --depth=50 --window=250
If it's going out of memory, one can limit it:
git config pack.deltaCacheLimit 1 git config pack.deltaCacheSize 1 git config pack.windowMemory 4g
- Convert CollabNet account names into real names
- maybe use the data from DomainDeveloper (complete that where necessary) if there's no easy way to extract the names from CollabNet
- Delete merged branches (from 'heads', not from history!)
- Provide the too old history as 'graft' - see eg. http://repo.or.cz/w/elinks.git?a=blob;f=contrib/grafthistory.sh
- Translations to a separate git tree as well?
- URE to a separate git tree?
- ODF Toolkit to a separate git tree?
- .pdf version of developer's guide consume quite some space as well - any chance to do something with it?
Links to Git comparison with other SCMs: http://git.or.cz/gitwiki/GitLinks#comparison
Comparison of git with Subversion: http://git.or.cz/gitwiki/GitSvnComparsion
Machines used for the testing
Git tests [let's call this one 'git machine' ;-)]:
- CPU: AMD Athlon(tm) 64 Processor 3200+
- RAM: 1G
- Disk (info from bonnie):
---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU one 1*2000 37819 77.6 44296 16.8 16982 5.1 35203 63.9 45915 6.6 152.4 0.4
- OS: SUSE 10.1
- Filesystem: ext3
- Net connection: ~20Mbit
The git repository could [should! ;-)] be tuned for better results:
- Delete integrated branches - the history will be still preserved, just the number of open heads will reduce (by about 3000)
- Graft history - the old development can be 'hidden' and available just to those who really need it using a simple script, like http://repo.or.cz/w/elinks.git?a=blob;f=contrib/grafthistory.sh . This way we can save about 1G of download!
|Size of data on the server [OOo sources]||8.5G||1.3G|
|Size of data on the server [3rd party]||1.1G||591M|
|Size of checkout [OOo sources]||1.4G||2.8G [files you can hack on (contains localize.sdf's) + the history]||3.3G [files you can hack on + localize.sdf's from data-trunk + .svn directories]|
|Size of checkout [3rd party]||98M||688M [files you can hack on + the history]||199M [files you can hack on + .svn directories]|
|Initial checkout time [OOo sources]||117 minutes (Linux, 2MBit DSL), 26 minutes (Linux, 2MBit DSL, with compression (-z 6)||130 minutes, (51 min for a pull) (Linux, 2MBit DSL) [from go-oo.org]
100min (Linux, 2MBit DSL, Wireless, no proxy) [from go-oo.org] (1586669 objects (counting, deltifying, indexing) 1144663 deltas to resolve)
| 60 minutes (Windows, 34Mbit Line)|
58 min [git machine]
|Initial checkout time [3rd party]|
|Branch creation||Immediately||Immediately with local svn server, 25 sec with collab.net server|
|Branch switch||<15sec [to newly created], 3min to an old one||12min 40sec [git machine] ??|
|Diff||Immediately||4min 13sec [git machine]|
|Merge||10sec [new branch with few changes], <3min [long living branch, harder scenario]|
|Resync||Same as 'Merge' - it's a merge from 'master' to the branch.|
|Integration||Same as 'Merge' - it's a merge from a branch to the 'master'.|
|Push||Not necessary||push back one branch in local network: 9 sec, push back repository 40 min||Not necessary|
'3rd party' in this context means the following modules: agg, beanshell, berkeleydb, bitstream_vera_fonts, boost, curl, dictionaries, epm, expat, freetype, hsqldb, icu, jpeg, libwpd, libxml2, moz, msfontextract, nas, neon, np_sdk, portaudio, python, sablot, sane, sndfile, stlport, vigra, xalan, xt, zlib.
Commands used for the tests:
|checkout [OOo sources]||cvs -d:pserver:email@example.com:/cvs co OpenOffice2||git clone git://go-oo.org/git/openoffice.org/ooo.git openoffice.org (How does this work with a proxy)|| svn checkout http://svn.stage.openoffice.org/svn/svn/trunk svn|
(This tree does not contain localize.sdf's, they are in trunk-data.)
|Branch creation||[all the following commands were issued in the openoffice.org subdir]
git branch test
|[all the following commands were issued in the svn subdir]|
|Branch switch||git checkout test||svn switch http://svn.stage.openoffice.org/svn/svn/vendors/sun-cvs/tags/SRC680_m172|
|Diff||vim vcl/unx/kde/salnativewidgets-kde.cxx [to do some changes] ; git diff||vim vcl/unx/kde/salnativewidgets-kde.cxx [to do some changes] ; svn diff|
|Commit||[with the changes from 'Diff']
git commit -a
|Merge [the simple scenario]||git branch test2 ; git checkout test2 ; vim vcl/unx/kde/salnativewidgets-kde.cxx [another changes] ; git commit -a ; git checkout test [preparation to have something to merge]
git pull . test2 [the merge itself]
|Merge [the harder scenario]||git pull git://go-oo.org/git/openoffice.org/ooo.git unxsplash
[an old CWS of mine - called cws_src680_unxsplash in the CVS]
|Resync||[it's usually not necessary to do resynces with git; but when needed to get a feature a branch would depend on, it's just a merge from remote 'master']|
|Integration||git checkout master ; git pull . test
[or alternatively: git checkout master ; git merge 'merging test into master' master test]