Difference between revisions of "Infrastructure Problems"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Collab.net services)
(34 intermediate revisions by 11 users not shown)
Line 1: Line 1:
__TOC__
+
== OOo's SourceCast Instance ==
== Developers Only ==
+
  
This article is about http://collab.net/ from a developers perspective; if you are not a developer - it would be appreciated if you do not edit it. We aim to be factually correct, if this is not the case please add comments.
+
The OOo instance of SourceCast is based on an extremely old version (2.6), the equivalent up-to-date product is 'CollabNet Enterprise Edition'. It is believed that many problems of the current infrastructure are fixed in the latest versions of this (ver 4.0). The use of 'SourceCast' hereinafter referrs to the (patched & older) OOo version of this infrastructure.
It should be stated at the outset that no-one editing this has any commercial advantage from criticising collab.net beyond of course improving the OO.o situation.
+
  
It is hoped that as these issues are fixed this page will eventually become a glowing testimonial to the success & efficiency of collab.net.
+
SourceCast provided services are typically extremely unresponsive - it typically taking longer to log-into SourceCast than search the entire-web at google.com with some complex search. High latencies are also sporadic - there are unpredictable periods of low & high latency.
  
== Overview ==
+
=== Scaling issues ===
  
The OpenOffice.org 'community' is (broadly) a failed community. It resembles the 'Freedows' community of the early days - ie. a public joke - full of people talking, structures, project-leads, roles but only a tiny fraction of the people involved, accounting for a handful or two are developers. Of course, this is not un-fixable, and the situation improves daily but this is where we are now.
+
Under heavy load - such as close to a release - it is common for SourceCast to become almost totally unresponsive & unusable, sometimes for days.
  
As such I view collab.net's attempts to build a meaningful developer community as a near complete failure. The blame for this is clearly split between Sun & collab.net in a way that is hard to partition. This document merely aims to explore some of the collab.net specific problems. Other common problems faced by people are enumerated [[ooo-build|here]]
+
=== Constant re-login ===
  
== Collab.net services ==
+
For unknown reasons SourceCast login infrastructure [http://qa.openoffice.org/issues/show_bug.cgi?id=34822 requires] that you re-log-in when closing and restarting your browser. It's hard to quantify quite how frequently, but normally for every new bug filed it is necessary to go to the IssueZilla page, log-in, hit back, hit refresh - adding (if you're lucky) 15seconds or so to any bug filing - prolly more. This latency is not present with other bugzilla derived products. Of course, developers Issue access patterns are typically not those of the casual tester, or heavy SourceCast user - they spend an hour or more fixing a bug, then return to mark the bug fixed (sometimes from one of their other desktop machines) - at which point; re-login is forced on them. A persistant, long-lived client-side authentication cookie would remove this problem entirely.
  
=== Source Cast ===
+
Some people report not being able to reproduce this frequent re-login issue; it is possible the bug relates closely to the end users' network topology, such as NAT gateways etc. Or - as a serious discussion without exaggeration seems to reveal - it's just that a session cookie is used and not a persistent one.
  
This is perhaps one of the most annoying pieces of software I've worked with in a long time.
+
=== CVS ===
Most OO.o services are provided by a single instance of SourceCast, running on 1 (Solaris) machine.
+
It is not clear that this is an advert for the performance of Solaris - however, it is clear that
+
some persistantly silly mis-configurations lurk there.
+
  
SourceCast provided services are typically extremely unresponsive - it typically taking longer to log-into
+
CVS is <b>extremely slow </b> ({{Bug|24771}}). This problem is compounded by the OO.o source code being extremely large it is true.  
SourceCast than search the entire-web at google.com with some complex search. High latencies are also
+
However an order of magnitude slow-down is due to a simple source-cast design bug of having the CVS .rcs files on a different disk to that of the CVS daemon itself - adding untold latency to each NFS file operation.
sporadic - there are unpredictable periods of low &amp; high latency.
+
  
==== Scaling issues ====
+
Access control - the CVS repository was by default constructed in such a way as to deny even those granted access commit writes to large chunks of it. This was coupled to the formal role request/granting process. Similarly the CVS structure itself (split into separate top-level modules per-project) is confusing (not matching the source directory layout),  also making it not possible to have a 'familiar' structure, configure / autogen.sh / README / BUILDING etc. in the top-level directory. [ at least without breaking other CVS operations ]. This artefact also makes the real CVS structure (as seen in LXR etc.) hard to navigate &amp; confusingly different.
  
Under heavy load - such as close to a release - it is common for SourceCast to become almost totally unresponsive &amp; unusable, sometimes for days.
+
CVS also has rather a [http://qa.openoffice.org/issues/show_bug.cgi?id=23306 habit] of loosing cvs tags cf. the [[CvsFAQ]]
  
==== Constant re-login ====
+
== Searching ==
  
The 31337 ultra-secure TM SourceCast login infrastructure requires that you re-log-in very frequently. It's hard to quantify quite how frequently, but normally for every new bug filed it is necessary to go to the IssueZilla page, log-in, hit back, hit refresh - adding (if you're lucky) 15seconds or so to any bug filing - prolly more. This latency is not present with other bugzilla derived products.
+
There is no ability to search openoffice.org without logging in, there's no good reason for that. More seriously googling doesn't show any results from within the mailing lists! Making them nigh on useless as a resource.  
  
==== CVS ====
+
{{Bug|58310}} has been raised to address this issue. The Googlebot web crawler is well behaved, and respects the overly restrictive 'robots.txt' files scattered around the OpenOffice.org website, such as this one: http://tools.openoffice.org/source/browse/tools/www/robots.txt.
 +
<br>
 +
The consequences of a [http://qa.openoffice.org/robots.txt reduced robots.txt] file for [[Infrastructure_Problems#Scaling_issues|scaling issues]] are checked for the qa project starting 2005-11-21.
  
CVS is <b>extremely</b> slow. This problem is compounded by the OO.o source code being extremely large it is true.  
+
Hint: as a workaround, for fast and easy mailing list searches and a threaded archive go to mail-archive.com, e.g. http://www.mail-archive.com/dev@openoffice.org/, combined with Google this leads to something like ''site:mail-archive.com/dev@openoffice.org YourSearchTerm'', which works in many cases. Of course there are also others like Gmane and such.
However an order of magnitude slow-down is due to a simple source-cast design bug of having the CVS .rcs files on a different disk to that of the CVS daemon itself - adding untold latency to each NFS file operation.
+
  
Access control - the CVS repository was by default is constructed in such a way as to deny even those granted access commit writes to large chunks of it. This was coupled to the formal role request/granting process. Similarly the CVS structure itself (split into separate top-level modules per-project) is confusing (not matching the source directory layout),  also making it not possible to have a 'familiar' structure, configure / autogen.sh / README / BUILDING etc. in the top-level directory. [ at least without breaking other CVS operations ]
+
== Projects &amp; roles ==
  
==== Projects &amp; roles ====
+
There is an extremely formal project / role structure built on SourceCast's features in this area, with various roles being requested, and granted via E-mail round-trips. Thankfully this doesn't impact CVS access these days - with broadly unconstrained CVS accounts being the norm.
  
There is an extremely formal project / role structure built on SourceCast's features in this area. Unfortunately extremely formalised structures, with roles 'project leads' etc. is inimical to the rapid transition of this large code base towards more external contribution, influence, interest etc.
+
== IssueZilla ==
  
Worse - in order to contribute it is necessary to be 'granted' a role; which occurs via a length E-mail round-trip to the project-lead, further hindering development. ( This is broadly fixed these days by creating more un-constrained accounts in the 1st instance ).
+
This is rather old and nasty compared with the excellent modern Bugzilla releases, that shows in lots of places - file typing, uploads, comment management, well - tens of usability &amp; cleanup features missing. {{Bug|34665}} contains a good sample of such problems.
  
==== IssueZilla ====
+
== Mailing lists ==
  
=== Community Management ===
+
It is critical in any new Free software project to attract developers. One way to drive away newbies is to have an unstated rule that anyone who wants to get a reply from a mailing list post needs to add "please CC me I'm not subscribed - and retain this message so the rest of the thread reaches me" in a prominent place in their E-mail. Thus (I imagine) people regularly ask a question on a list, and <i>receive</i> no reply, even if one is written.
  
Collab.net provides the services of Louis Suarez-Potts as a 'community manager'. Louis is an excellent, engaging &amp; pleasant person - this is not a personal critique.
+
Futhermore address munging is a hugely bad idea for busy mailing lists - it is not possible to read all (busy) mailing lists on topics that people are interested in in linear time; hence keeping a thread CC'd to one is important, it allows a quick response - while keeping the mailing list informed. This is not possible with collab-net's Reply-To: mangling - hence discouraging busy people from using or CC'ing the mailing lists.
However - it has to be pointed out that Louis is not a developer.
+
  
This has several unfortunate consequences - when speaking at conferences worldwide, it is difficult to generate the vital 1-to-1 personal bonds that are the real substance of community, to empathise &amp; assist with development issues, to encourage the growth of a true developer community. It is also possible that this leads to exacerbating other structural, and proceedural problems, and over-dependence on formal process rather than developer-to-developer relational interaction, an unhealthily centralized set of relations, and a lack of meritocratic accountability; where merit is based on hard contribution: code, translation, etc.
+
Also Reply-To: mangling is just a [http://www.unicom.com/pw/reply-to-harmful.html bad idea], cf. Linux Kernel, GNOME et. al's non-invasive, non-munging policies that encourage contributors &amp; build collaboration.
  
 +
It may be that one reason behind this mangling is to discourage people from replying off-list to people, - that unfortunately stifles community by not building strong inter-personal relationships, (although clearly off-list replies tend to be short, punchy, amusing, and not the norm). Another reason may be that there exist people out there who don't know how to use their mailer's reply-to-all feature.
  
=== Subversion development ===
+
It is believed that underneath SourceCast uses [http://cr.yp.to/ezmlm.html ezmlm] to handle mail.
 
+
It must be said - that the funding of the development of subversion is an unmitigated good, and goes some (small) way to providing a feel-good factor for collab.net. OTOH it's not clear that it will scale to the OO.o's use-case.
+
  
 
== Missing services ==
 
== Missing services ==
 
=== Wiki ===
 
 
It is (apparently) possible for 'anyone' to edit the project pages in source-cast; however one has to engage with this formal role based process and argue with whomever arrived there first and got a 'project lead - obstruction project' role; get cvs commit access etc.
 
 
A wiki provides way more freedom for people to get involved with editing content &amp; thus substantially lowering the barrier to improving documentation, etc. Indeed - a wiki (it may be argued) is a great re-application of traditional Free-software liberal attitudes to code contribution into the field of web/docs. Unfortunately those traditional liberal principles tend not to be applied in OO.o
 
  
 
=== LXR / Bonsai / Tinderbox ===
 
=== LXR / Bonsai / Tinderbox ===
Line 77: Line 64:
 
=== RSS aggregator / 'planet' ===
 
=== RSS aggregator / 'planet' ===
  
== Other problems ==
+
It's not clear why it is not possible eg. to re-direct .openoffice.org domain names to existing solutions here as elsewhere.
 
+
=== Inflexibility ===
+
 
+
It is (apparently) not possible to have the planet.openoffice.org DN re-directed to the existing, planet aggregator http://planet.go-oo.org/ - it is not clear why not.
+
 
+
=== Culture of blame ===
+
 
+
When collab.net people are presented with these problems there are typically two responses
+
 
+
* it is all Sun's fault - they want it like this
+
* it is all fixed in XYZ new (but incompatible) version of SourceCast
+
  
So - quite possibly many of these issues are simply a failure of backwards compatibility of the SourceCast application, and/or a failure to manage change and migration successfully. There is almost never a recognition that many problems exist and that they need fixing.
+
== Additional Links ==
 +
[[SVNMigration|Migration to SVN]]
  
=== Demands for cash ===
+
[[Infrastructure_Requirements]]
  
Even doing quite simple tasks - such as adding existing off-the-shelf infrastructure (eg. a wiki) result in demands for more money. Unfortunately - the business is built around charging money for doing things that other people are prepared to donate for free to (typical) free-software projects. It is not clear how sustainable this business model is long term.
+
[[Infrastructure_Overview]]
  
Thrift is usually thought of as a virtue, hence it is irritating (to say the least) to know that a level of service that is generally worse than can be had for free (without an SLA), and is inflexible &amp; unfixable costs some (unknown, but presumably large) sum of money.
+
[[Category:Website]]
 +
[[Category:Build_System]]

Revision as of 03:03, 29 December 2008

OOo's SourceCast Instance

The OOo instance of SourceCast is based on an extremely old version (2.6), the equivalent up-to-date product is 'CollabNet Enterprise Edition'. It is believed that many problems of the current infrastructure are fixed in the latest versions of this (ver 4.0). The use of 'SourceCast' hereinafter referrs to the (patched & older) OOo version of this infrastructure.

SourceCast provided services are typically extremely unresponsive - it typically taking longer to log-into SourceCast than search the entire-web at google.com with some complex search. High latencies are also sporadic - there are unpredictable periods of low & high latency.

Scaling issues

Under heavy load - such as close to a release - it is common for SourceCast to become almost totally unresponsive & unusable, sometimes for days.

Constant re-login

For unknown reasons SourceCast login infrastructure requires that you re-log-in when closing and restarting your browser. It's hard to quantify quite how frequently, but normally for every new bug filed it is necessary to go to the IssueZilla page, log-in, hit back, hit refresh - adding (if you're lucky) 15seconds or so to any bug filing - prolly more. This latency is not present with other bugzilla derived products. Of course, developers Issue access patterns are typically not those of the casual tester, or heavy SourceCast user - they spend an hour or more fixing a bug, then return to mark the bug fixed (sometimes from one of their other desktop machines) - at which point; re-login is forced on them. A persistant, long-lived client-side authentication cookie would remove this problem entirely.

Some people report not being able to reproduce this frequent re-login issue; it is possible the bug relates closely to the end users' network topology, such as NAT gateways etc. Or - as a serious discussion without exaggeration seems to reveal - it's just that a session cookie is used and not a persistent one.

CVS

CVS is extremely slow (Issue 24771 ). This problem is compounded by the OO.o source code being extremely large it is true. However an order of magnitude slow-down is due to a simple source-cast design bug of having the CVS .rcs files on a different disk to that of the CVS daemon itself - adding untold latency to each NFS file operation.

Access control - the CVS repository was by default constructed in such a way as to deny even those granted access commit writes to large chunks of it. This was coupled to the formal role request/granting process. Similarly the CVS structure itself (split into separate top-level modules per-project) is confusing (not matching the source directory layout), also making it not possible to have a 'familiar' structure, configure / autogen.sh / README / BUILDING etc. in the top-level directory. [ at least without breaking other CVS operations ]. This artefact also makes the real CVS structure (as seen in LXR etc.) hard to navigate & confusingly different.

CVS also has rather a habit of loosing cvs tags cf. the CvsFAQ

Searching

There is no ability to search openoffice.org without logging in, there's no good reason for that. More seriously googling doesn't show any results from within the mailing lists! Making them nigh on useless as a resource.

Issue 58310 has been raised to address this issue. The Googlebot web crawler is well behaved, and respects the overly restrictive 'robots.txt' files scattered around the OpenOffice.org website, such as this one: http://tools.openoffice.org/source/browse/tools/www/robots.txt.
The consequences of a reduced robots.txt file for scaling issues are checked for the qa project starting 2005-11-21.

Hint: as a workaround, for fast and easy mailing list searches and a threaded archive go to mail-archive.com, e.g. http://www.mail-archive.com/dev@openoffice.org/, combined with Google this leads to something like site:mail-archive.com/dev@openoffice.org YourSearchTerm, which works in many cases. Of course there are also others like Gmane and such.

Projects & roles

There is an extremely formal project / role structure built on SourceCast's features in this area, with various roles being requested, and granted via E-mail round-trips. Thankfully this doesn't impact CVS access these days - with broadly unconstrained CVS accounts being the norm.

IssueZilla

This is rather old and nasty compared with the excellent modern Bugzilla releases, that shows in lots of places - file typing, uploads, comment management, well - tens of usability & cleanup features missing. Issue 34665 contains a good sample of such problems.

Mailing lists

It is critical in any new Free software project to attract developers. One way to drive away newbies is to have an unstated rule that anyone who wants to get a reply from a mailing list post needs to add "please CC me I'm not subscribed - and retain this message so the rest of the thread reaches me" in a prominent place in their E-mail. Thus (I imagine) people regularly ask a question on a list, and receive no reply, even if one is written.

Futhermore address munging is a hugely bad idea for busy mailing lists - it is not possible to read all (busy) mailing lists on topics that people are interested in in linear time; hence keeping a thread CC'd to one is important, it allows a quick response - while keeping the mailing list informed. This is not possible with collab-net's Reply-To: mangling - hence discouraging busy people from using or CC'ing the mailing lists.

Also Reply-To: mangling is just a bad idea, cf. Linux Kernel, GNOME et. al's non-invasive, non-munging policies that encourage contributors & build collaboration.

It may be that one reason behind this mangling is to discourage people from replying off-list to people, - that unfortunately stifles community by not building strong inter-personal relationships, (although clearly off-list replies tend to be short, punchy, amusing, and not the norm). Another reason may be that there exist people out there who don't know how to use their mailer's reply-to-all feature.

It is believed that underneath SourceCast uses ezmlm to handle mail.

Missing services

LXR / Bonsai / Tinderbox

With 8 million lines of code no-one outside Sun is familiar with, is is essential to have some hard-code code search, change tracking, indexing functionality. Unfortunately SourceCast does not provide this, that makes it way more difficult to collaborate on developing the code.

A central well-maintained Tinderbox server, should be a pre-requisite for any large project with as many complex build issues as OO.o. One is provided at http://go-oo.org/tinderbox/ however.

RSS aggregator / 'planet'

It's not clear why it is not possible eg. to re-direct .openoffice.org domain names to existing solutions here as elsewhere.

Additional Links

Migration to SVN

Infrastructure_Requirements

Infrastructure_Overview

Personal tools