User:DrewJensen/why why

From Apache OpenOffice Wiki
Jump to: navigation, search

Analysis done by TerryE in April of this year. Original Posting at OOoForum

OOoForum is sustaining about 3 transactions per second in the heavy hours, generating over 6 Gbytes of traffic per day, and has a database of around 400 Mbytes. This is quite a profile guys. Heavier than I had anticipated until I cranked the numbers.

TerryE — I've since done three more OOoForum database snapshots, which have been pretty complete. The updated summary totals for these are as follows:

Date of D/B snapshot               30-Apr-07    29-Jul-07    16-Sep-07
Days elaplsed since previous snap         71           90           49
Total Views                       18,290,849   20,220,874   21,364,950
Total Views per day                                21,445       23,348
Total Posts (= topics + replies)     213,624      223,735      231,948
Total Posts per day                                   112          168

These data give a lower figure than previously. There are two factors here: (i) I missed some topics in my first pull, and did not correctly adjust for this leading to an overstated delta value. (The server was so slow there were a lot of timeouts). What this still shows is that we are averaging about 1,000 views per hour on across the day. If you look at the peak periods, this equates to one view per second or thereabouts. The post volumes are a lot lower at around 4 per hour average, say 10 per hour peak or one every six mins. Note the view to post ratios. The vast majority of transactions are queries (150:1 read:upade). None the less at ~60-70Kbytes / view under phpBB, this still represents a daily traffic of 1.5 Gbyte. You can find the analysis spreadsheet here (3Mb) and the full copy of the forum posts in TSV format here here (48Mb).

Next is an analysis of the spread of posting. Here I have analyses posts per poster, then ranked by totals per poster and grouped these into bands.

Post Bands #Posters #Posts
1-5 27,780 52,842
6-15 3,219 27,560
16-50 875 21,865
51-200 194 16,468
201-1,000 58 25,090
1,000+ 30 68,583

For a more detailed break down see Aug 20, 2007 Posting Pattern Analysis

In fact the top 68 posters account for over 45% of all board activity. This is overwhelmingly in response to a users question.

From this you can see that the view transactions in fact dominate, with there being around 130 views for every post. This is partly the fact that many of use preview and view in preparing a post, but also there is a huge body of "read-only" guests that are continually browsing the forum (as I write this there are 2 power contributors, 3 occasional users and 94 guests logged onto oooforum).

Analysis of usage rates / time of day

[Image:PostsByToD.png Posts By Time of Day]

and growth rates in posting transaction per month

[Image:PostsByMoth.png Posts By Month]

A second analysis was performed in June of this year. Full posting at

These are in pairs (#Replies, #Views) for the three downloads that I've done: 18 Feb, 30 Mar, 29 Jun. Some magic numbers
  • We've been running pretty steady at 190 posts per day on OOoForum. The hourly averages vary from about 3-6% with the peak window is Midnight GMT which equates to 12 posts per hour or one every 5 mins, with peak bursts maybe 2-3x or that. We have on average 4 posts per topic, so that's 40-50 new topics per day.
  • OOoForum had about 1.9M views in this same period, which equates to 21K per day, or 900 average per hour. Quite different from the 12 posts.
  • I got at first a bizarre correlation between the number of views and the number of posts per topic. At first I thought that this was due to Bot activity, but then I looked at individual sets for say 3 replies and got a very different pattern. I wrote a little routine to histogram by # of replies and this showed a common pattern which is best seen by the the attached graph. This is a log-lin plot of a histogram of view counts in the last 90 days by topic.

[Image:Histogram1.png Histogram]

This has a strong negative exponential characteristic. One of the strongest causal mechanisms for this is that peoples criteria for deciding to view a given post are largely independent. (Though there were a small collection of top posts so the top 1,000 topics picked up 0.5M of the 1.9M views).
  • The reason that the number of views is strongly correlated to the number of replies is that (i) as a message content is approximately proportional to the number of replies, people tend to hit larger messages when searching, (ii) users tend to regard messages with lots of replies and views as interesting.
Personal tools