Education ClassRoom/Previous Logs/tinderboxes

From Apache OpenOffice Wiki
Jump to: navigation, search

[10:59] <cloph> Hi *

[10:59] <chacha_chaudhry> cloph: hi

[10:59] <cloph> I uploaded my slides - link is on the agenda-page, or go straight to http://muenchen-surf.de/lohmaier/misc/

[10:59] * lgodard (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has joined #education.openoffice.org

[10:59] <ericb2> cloph: FYI, vincent vikram informed me there is a firewall, and some of his students will read the log afterwards

[10:59] <cloph> Not many people here yet... :-) but getting more apparently...

[11:00] <ericb2> cloph: and they will ask using mail or mailing lists

[11:00] <ericb2> cloph: even IRC is difficult at some places

[11:00] <chacha_chaudhry> cloph: yes it is ... at some universities

[11:02] <cloph> Just shout "Go" when I should start/when you got the slides :-)

[11:03] <ericb2> cloph: thanks for your slides

[11:03] <ericb2> as .pdf : http://muenchen-surf.de/lohmaier/misc/All_about_Tinderbox.pdf

[11:03] <ericb2> as .odp : http://muenchen-surf.de/lohmaier/misc/All_about_Tinderbox.odp

[11:03] <ericb2> cloph: we are ready :) you can start when you want

[11:03] <chacha_chaudhry> cloph: Go

[11:03] <chacha_chaudhry> :)

[11:04] <cloph> OK then - as you already read the agenda, you know what this talk is about: Tinderbox :-)

[11:05] <cloph> You can see how the talk will proceed on the contents slide - but as you can read faster than I can type, I'll not read it to you :-) <flip/>

[11:05] <cloph> If you have questions in the meantime, don't hesitate to interrupt me, feel free to ask without rising your hand first

[11:06] <cloph> The question: "What is tinderbox?" can be answered fairly easily: It is a system that collects build stati from various sources and displays those in a hopefully clear and eady to understand way.

[11:07] <cloph> This is the basic task that tinderbox has. To reach that goal, it has other features, like integration with bonsai (or other tools).

[11:07] <chacha_chaudhry> sources means ? -- various platforms or OS?

[11:07] <cloph> In that case it means both.

[11:08] <cloph> There are multiple clients that build the code, those clients run on different OS/Platforms, have a different build-setup and build different cws

[11:09] <cloph> So while there might be two builders that run linux, one can use Sun's JDK, the other can use gcj, or one can use gcc 3.4, the other gcc 4.3 - that sometimes can make a big difference.

[11:10] <cloph> For those who don't know what bonsai is: Bonsai is a tool that collects commit-information, it is a more advanced "CVS viewer" - it allows you to query for commits in a given period of time, or associated with a specific tag or file.

[11:10] * ericb2 suggests to read : http://wiki.services.openoffice.org/wiki/Education_ClassRoom/Practice#Bonsai_use

[11:11] <cloph> Not OOo developed tinderbox, it merely modified Mozilla's tinderbox2 (rather slightly modifications only). Mozilla is a great project when it comes to such stuff (think of Bugzilla and stuff)

[11:11] <cloph> Ah, great :-)

[11:11] <cloph> <flip/> So how's tinderbox used within the OpenOffice.org project?

[11:12] <cloph> Tinderbox provides overview pages of the results, grouped per status of a CWS (more on that later)

[11:13] <cloph> It is a fast way to check: "Will a problematic cws soon hit the Master" (that's how I use those pages at least :-)) - besides those overview pages, it also has status pages for the indivdual cws, that shows more info (also more on that later)

[11:14] <cloph> Tinderbox is also integrated with EIS, in a way that it gets the tag-list (the list of CWS, what milestone they're based on and what cws modules they include) from EIS via SOAP, and queries cvs directly to get hold of the latest milestones.

[11:15] <cloph> Some of you might also heard of buildbot or termite already - this is a related tool, that also is meant to provide a way to automatically build code on different buildslaves

[11:16] <cloph> In OOo, both real tinderboxslaves as well as (some of) the buildbots report the build status to tinderbox. From a tinderbox point of view, it doesn't matter what system build the code, it doesn't make a difference there.

[11:17] <cloph> <flip/>Pictures say more than a thousand words, so just have a look at some of the example (in case you could resist the urge to click on one of the URLs :-)

[11:18] <cloph> The first one shows the overview page that shows the cws in nominated state, as you see, all is green (or yellow). Green is good :-)

[11:18] <cloph> <flip/>as another example a few cws in the new state. You see some are red. Red is bad :-(

[11:18] <cloph> In case you wondered what the other colors mean: <flip/>

[11:19] <cloph> OOo uses the following: There is a "success" status, a "test failed" status (orange), a "build failed" status, a "currently building" status, a "dirty" status and a "fold" status.

[11:20] * ericb2 discovering the real sense of Orange color :)

[11:20] <cloph> The test failed status is maybe a little misleading, as it currently is not used as to indicate some tests failed or not (sorry ericb2 :-)

[11:20] <ericb2> cloph: np

[11:21] <cloph> The only buildslave that makes use of that status is the Mac PPC buildslave to indicate when it had to rebuild i18npool multiple times or similar (non-reproducible build failures that can be overcome by just rebuilding the affected modules, a rather special szenario, only affecting the PPC)

[11:22] <cloph> I guess "greeen" and "red" are self-explanatory. The dirty status can be set thanks to bonsai integration. That way tindebox knows when commits have been performed after a build was started. So it knows that the results (while valid for the code that was built), doesn't reflect the current status of the cws anymore.

[11:24] <cloph> The grey status is mainly introduced for the buildbot buildslaves, that don't manage their buildqueue themselves, but are told what to build. That way they can say: "sorry man, I don't want to build that stuff". Mainly because a buid-breaker is known already, or the buildslave just only wants to build newer milestones and not old cruft..

[11:25] <cloph> <flip/> I already mentioned that Tinderbox is using EIS - the same is true the other way round. EIS uses tinderbox as well, it can show the tinderbox status in the EIS overview pages.

[11:26] <cloph> Unfortunately the default view when browsing to a EIS-CWS page is "overview", and that doesn't show the tinderbox info, but that can be configured by the user (but of course is not possible when just using the "guest" login for convenience

[11:27] <cloph> <flip/> So far, I only talked about the overview pages, those don't offer that much info as opposed to the real per-cws status pages.

[11:27] * ericb2 always clicks "All" button

[11:27] <cloph> Again there is a cross-reference to EIS (the link at the very top will bring you to the corresponding EIS page).

[11:27] * cloph has the default (when logged in) set to Tinderbox :-D

[11:29] <cloph> The status page is structured in a table view. At the left you see a time column, next to it a "guilty" column, and after that the columns for the individual buildslaves (be it real tinderboxes like the Fedora and Mac ones in the example, or buildbots like the O3-build and Win-XP2 ones).

[11:30] <cloph> The "Guilty" column lists commits, you can have a look at http://tinderbox.go-oo.org/aquavcl08/status.html for example, that lists the latest commit by ericb2 to the cws

[11:30] <cloph> That info comes from Bonsai.

[11:30] <cloph> You can use either the commit-entry in the "guilty" column or the timeline to query bonsai for what exactly was committed.

[11:32] <ericb2> good idea to link with Bonsai

[11:32] <cloph> I basically only query bonsai by passing by the tinderbox page of the cws, since that way I don't have to fill in the query form manually, and usually I'm only interested in the commits after the last successful build, the tinderbox pages makes that easier (IMHO)

[11:33] <cloph> The header of the buildslave column show some info about the buildlsave, like when the cws was built last, what the average buildtime is (actually mean, not average), how long a current build will still run

[11:33] <cloph> see e.g. http://tinderbox.go-oo.org/iconupdate300u1/status.html

[11:34] <cloph> average buildtime is around 220 minutes, and the result is overdue (that is because I'm currently building another tree outside tinderbox and that costs CPU :-))

[11:35] <cloph> The box with the status result then is the most important part: That box provides access to the buildlogs and only indicates when a cws was built.

[11:36] <cloph> Some slaves (those maintained by me at least), also specify what patches were applied (for known build-breakers affecting the Master the CWS is based on), and whether some of the abovementioned quirks were needed (in the case on the screenshot, the i18npool problem was hit)

[11:36] <cloph> In case the build was done a while back, you can as well go back in time with the "show previous xxx hours" at the bottom of the overview page.

[11:37] <cloph> (But the actual logs might not be available anymore, they get removed by a cronjob)

[11:38] <cloph> <flip/>So let's assume the build broke (is marked as red) and you want to know why it broke. Klick on one of the "l L C" links to open the popup (this might be a bit tricky, since it closes when you hover over another link before reaching the popup, and also when hovering over the "close" link in the popup itself)

[11:38] <cloph> From that pupup, there are links to show a brief (Summary) log and the full log.

[11:39] <cloph> In the case for OOo, where full logs can reach 40 to 50 MB (uncompressed), the only sensible way to start is by using the brief log, that only shows lines above and below a "error", and skips the rest.

[11:40] <ericb2> cloph: when you find an error, what can be done ? Do you send a mail to the dev asking him to fix the problem ?

[11:40] <cloph> The first what is listed on the brief-log page are the tinderbox annotations (more on that later), the most important being the tinderbox- administrator one. This is meant to show the admin that is responsible for the buildbot, the one who can be contacted when there's a problem with the buildslave itself/the person that can be asked for help in reading the log.

[11:41] <cloph> ericb2: Yes, either mail directly, file an issue, comment in EIS, try to reach the dev on IRC.


[11:41] <cloph> What way you choose basically depends on how urgent it is. If the cws is already nominated, do whatever you can to make them aware of the problem :-)

[11:42] <cloph> If it is still in status new, the dev might not care already, since more changes are to come anyway/doesn't even build for the developer him/herself

[11:43] <cloph> If you look at the screenshot, you might notice one problem already: For the buildbot buildslaves, not the real administrator of the buildbot is shown, but a general alias, "buildermaster@termite.go-oo.org" - this is a limitation of buildbot currently, and might be solved in future.

[11:43] <cloph> <flip/>Now to the next part. Following the annotations, the detected error messages are listed.

[11:44] <cloph> Note that those lines are not "errors" by themselves, merely lines that /could be/ errors. It is just detecting words like "failed" or "error" in the log and using those to flag a line (of course more elaborate than that, but enough to get the idea)

[11:45] <cloph> As builds usually stop after they hit an error, the error is usually found at the very bottom of the list (more or less, since many buildslaves do parallel builds, so it might be further up a little)<flip/>

[11:46] <cloph> So in the first line that is shown in the next screenshot, just above the buildlog you can find the error that broke this build: "error: "m_xORB" was not declares in (this scope)

[11:46] <cloph> Click on that link and it will bring you to the line where it appears in the log <flip/>

[11:47] <cloph> There it is, flagged in red, with context above and below. There you also see what I mentioned above: this was a parallel build, so you see lines of other stuff that was compiled interspread with the module that broke.

[11:48] <cloph> The links on the left are the linenumbers, each line has an html-anchor, so you can link to any line in the log directly, the "Next" links jump to the next "error" in the log.

[11:49] <cloph> I write "error" since the error count that is shown on the colored build-status box always causes confusion: "How can a build flagged as successful, when there have benn 30 errors?" is a often heard question.

[11:50] <cloph> So now that we had a look on basic funcionality of tindebox and had a look on how to use it, let's switch to the "why" part, why bother?

[11:50] <cloph> I don't know how many of you already built OOo - Just let me say that building OOo takes looooong. OOo is huge, and requires much time (and also diskspace) to build.

[11:51] <cloph> It is very annoying when you start a build in the evening, to start working or testing the build the next moring, only to find out that your build broke after 20 Minutes.

[11:52] <cloph> OOo's development module is designed in a way that it should ensure that there's always a usuable Master.

[11:52] * lgodard has quit ("Leaving.")

[11:53] <cloph> It is split in childworkspaces, cws, there development is focues on a few issues, few features or a big one, seperated from other development activities. So after a while (every week or two weeks), those cws that are done get integrated into a master. The number of cws can be quite high.

[11:54] <cloph> If the master then breaks, you need to investigate: Why does it break? Is it a combination of cws that cause the break, or is one cws just being broken?

[11:54] * Lachs (n=Gregor@sd-socks-197.staroffice.de) has joined #education.openoffice.org

[11:54] <cloph> Here's where tinderbox jumps in. It can tell: Look, this cws is flagged red, it caused a build breaker.

[11:55] <cloph> Ideally that cws will not be integrated after the problem is solved, but even when it is, that info can help to find a solution earlier, to find the developer faster who can fix the breaker.

[11:55] * lgodard (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has joined #education.openoffice.org

[11:56] <cloph> While release-engineers build the code before they announce the master as ready, they of course only use their setup, and that doesn't reflect what the community builders use. Some use Sun's java, some use gcj, some build with features that are turned off in Sun's configuration, some disable features. Some do excessive multi-processing builds, etc.

[11:57] <cloph> So the goal is: Don't release a master that cannot be built by somebody.

[11:57] <cloph> <flip/>So why does it still happen then? This brings us to the limitations of tinderbox.

[11:58] <cloph> The basic problem is compliance.

[11:59] <cloph> Not all build-breakers are faults in the code. There can be a misconfiguration of the buildslave, there can be a problem with the master that the cws is based on (so the problem is in the master, and not in the changes the developer did in his/her cws), there can be infrastructure problems (anoncvs not up-to-date or not reachable at all)

[12:00] <cloph> Furthermore people are impatient, they want results "immediately" after they commited their stuff. This is of course not possible, buidling takes 3 to four hours on fast machines, and of course the build is not started immediately after the commit, since there are other cws to be built as well.

[12:02] <cloph> As another kind of limitation, that is not related to buildability, is the fact that tinderbox doesn't care about whether the produced Office actually works or not, what counts is only "are ther build-breakers or not". (the test_failed status already suggest that this is not a limitation of tinderbox, one could actually use a dedicated status for that), the problem is that none of the bots do run tests, that there are/

[12:02] <cloph> Furthemore running tests also costs time, meaning the build results for the cws would be delayed even further.

[12:03] <cloph> <flip/>Also while the community buiders use a variety of build-configurations, tinderbox only covers a very small part of it.

[12:03] <cloph> There just aren't enough buildslaves to cover each and every setup.

[12:04] <cloph> buildslaves also use a fixed set of configure options, so don't detect when stuff breaks in code that is not activated, and because of a limitation of EIS, the buildslaves cannot build cws that introduce a new module to cvs (that module just isn't listed in EIS, the bot cannot know about it)

[12:05] <cloph> Last but not least, fixing a breaker sometimes is a lot easier or only possible when you have access to an affected buildhost, so even if a developer did have a look at the look, he/she might not be able to fix it

[12:07] <cloph> While tinderbox has a way to handle installsets (you can send files or links to installsets), given the size of OOo (140MB for Mac install set for example), it is just impossible to upload every installset that is build by the slaves, and since the tinderbox buildslaves are all self-contained, decide themselves what they build, there is no way to request an installset but by asking the maintainer.

[12:07] <cloph> (Buildbot on the other hand can be used to request an installset)

[12:07] <cloph> <flip/>Now to the recruiting part :-)

[12:08] <cloph> What can be done to help? - well the first one is simple: Provide a buildslave. But of course not everybody has a suitable build-machine or wants to maintain a buildslave, so there are other options as well

[12:09] <cloph> Be a mediator between the results and the developers. Notify them of build-breakers caused by their code (ideally in form of a patch), and maybe even more important: Notify the administrator of the bot when the build-breaker is caused by the bot, not by the code.

[12:10] * lgodar1 (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has joined #education.openoffice.org


[12:10] <cloph> The list of "errors" might be cut as well, while it is possible to just whitelist some of the lines, it might actually be more desireable to get rid of the complaining in the first place.

[12:11] <cloph> This is kind of a janitorial task, and can cause a lot of work, but maybe someone wants to tackle it nevertheless :-)

[12:11] <ericb2> cloph: yes

[12:12] <cloph> Good :-) <flip/> so in order to setup a buildslave, you of course need to know how it actually works

[12:12] * ericb2 updated the logs for people who cannot use IRC

[12:12] <cloph> The interaction with the tinderbox system is very simple: The buildslaves just need to send their buildlogs via mail to tinderbox. Nothing more, nothing less.

[12:13] <cloph> Tinderbox then passes the logs through the errorparser to create the brief and full logsd and creates the statuspages for the cws. Add the bonsai information to that and tinderbox' job is done.

[12:14] <cloph> <flip/>Of course in order to run a bot, you must be able to build OOo on your system, then automate that process and you have a tinderbox buildslave

[12:15] <cloph> <flip/>You need to pay attention to the mail though

[12:15] <cloph> tinderbox needs to know to what tree (cws) the log belongs, when the build was done, what the outcome was, what buildslave build it, etc. That's what the tinderbox annotations are for. You just put those lines above the actual log.

[12:16] <cloph> And you need to add the mail-header corresponding to the type of message: One with the log in the body: Use X-Tinder: cookie, for logs with gzipped attachment, use X-Tinder: gzookie.

[12:17] <cloph> <flip/>the gzipped logs are one of those cusomisations applied to OOo's tinderbox installation. Uncompressed logs, as mentioned before can be huge, 40MB and more.

[12:17] <cloph> But those logs compress very, very well. A gzipped log is 2,5 to 3 MB in size only.

[12:18] <cloph> Sending mail can be easily automated with perl (or mutt, or ....) - two modules that I used my self are Mail::Sender that can be installed via CPAN, and SendEmail

[12:19] <cloph> I now suggest SendEmail, since that one supports connections with TSL, as required when using gmail for example, it is a standalone program written in perl and works quite well.

[12:21] <cloph> <flip/>On the buildscript side, the script doesn't need to do much either: It needs to setup the buildtree, apply patches for known breakers (and annotate them if possible), and then finally send the captured log to tinderbox. It is advised that the buildslave doesn't only send the mail when all is finished, but also when it is starting a build, that way people know when a build is running, and when the results can be

[12:22] <cloph> Then continue the process, start with the next cws...

[12:22] * lgodar1 has quit ("Leaving.")

[12:22] * lgodard has quit ("Leaving.")

[12:22] * lgodard (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has joined #education.openoffice.org

[12:23] <cloph> So - that basically concludes the presentation. I learned that I type far, far too slow to stay in the announced time, but Since you're still (or again :-)) here, I don't think that really did matter... <flip/> So questions and answers time. Anyone?

[12:24] <ericb2> cloph: sorry, I was copying/pasting the changes

[12:24] <ericb2> chacha_chaudhry: questions ?

[12:25] <ericb2> cloph: I got one: to summarize, if ever I got a machine and can give processor time, hw proceed, what install ? Where ask, whom ask for tips ?

[12:26] <ericb2> cloph: I noticed the first step is complete an OpenOffice.org build

[12:26] <ericb2> cloph: and then, start with tinderbox

[12:26] * lgodard (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has left #education.openoffice.org

[12:26] <cloph> (I think I forgot the mention the link to the wiki pages in the presentation: http://wiki.services.openoffice.org/wiki/Tinderbox here you find links regarding EIS, a link to the RedTinderboxStatusinEIS page (that lists some known false positives), and also short setup-guide)

[12:26] <chacha_chaudhry> cloph: any client side buildslave clients easy to config?

[12:26] <cloph> ericb2: Yes, the prerequisite is that one is able to build OOo.

[12:26] * lgodard (n=lgodard@AGrenoble-152-1-65-106.w86-193.abo.wanadoo.fr) has joined #education.openoffice.org

[12:27] <cloph> chacha_chaudhry: You mean ready-to-use scripts?

[12:27] <chacha_chaudhry> cloph: yes

[12:28] <cloph> I have one that I could make reusable... I use it on Linux and Mac, so it should work for those, and since I use perl, the princible would also work on cygwin (but of course I didn't pay attention regarding paths and stuff)

[12:29] <ericb2> cloph: how many time/day does it need to maintain a tinderbox ? Do you need to upgrade something from time to time ?

[12:29] <cloph> My scripts listen on a fifo for enqueue requests, you can do "echo mycws > fifo-pipe" to enqueue a build (a cronjob can automate this), clear the queue with "echo dequeue > fifo-pipe" and stop the slave "echo quit > fifo-pipe" (will wait until build is finished

[12:29] <cloph> ericb2: Ah, good catch.

[12:30] <cloph> A buildslaves requires attention every time a new master is released.

[12:30] <chacha_chaudhry> cloph: may you upload it some place ? It would be helpful

[12:30] <cloph> You need to check whether that new milestone built fine on your machine, and if not hunt for the necessary patches/file issues so that the master can be built again.

Personal tools