User:TerryE/phpBB3.0.4 Migration/Standardisation Strategy

From Apache OpenOffice Wiki
< User:TerryE
Revision as of 10:49, 7 April 2009 by TerryE (Talk | contribs)

Jump to: navigation, search

OOo Community Forums Migration Strategy

The current openOffice.org (“OOo”) forums (“Community Forums”) started with the creation of the English forum in Nov 2007 running on phpBB 3.0.0 over a Sun Coolstack platform and PostgreSQL 8.1 running on Solaris x86 infrastructure in the Sun Munich Facility. We subsequently upgraded to phpBB 3.0.1 and PostgreSQL 8.2 and then added the Hungarian, French, Spanish and Japanese forums. We are now in the process of adding the Vietnamese and Chinese forums. Given the proliferation of forums, the general growth in transactional volumes and the fact that phpBB is now at version 3.0.4, we have decided to upgrade the H/W infrastructure and install a clean re-baseline for all forums. This wiki page describes the process and approach for migration the OOo forums from their current platform to the new installation.

Assumptions and Constraints

  • The Community Forums are running on a number of separate phpBB instances all at version phpBB 3.0.1 (but with disjoint code hierarchies) over a common PostgreSQL 8.2 database on a Solaris/Coolstack platform.
  • The content of the Community Forums is largely maintained in the central database, though attachment and image data are maintained in the file separate file hierarchies. All such content will be preserved on migration.
  • A new hardware platform will be running Solaris 10, plus Coolstack 1.3.1 including the optional CSKPython package. All instances will use the 32bit CSK packaged MySQL 5.0.
  • Each instance of the Community Forums implements a language-specific forum (an “NL forum”). At cut-over to the new hardware, these will comprise the following NL forums: English (en), Spanish (es), French (fr), Hungarian (hu), Japanese (ja), Dutch (nl), Vietnamese (vi) and Chinese (zh).
  • Each NL forum will have a designated lead “NL Administrator” who is responsible for final acceptance and configuration of the forum as follows:
  • Each NL forum will have a designated alternate NL Administrator who can act as a deputy to the lead.
  • The target Community Forums configuration will be a largely “out of the box” configuration of phpBB version 3.0.4 designed to facilitate the easy addition of extra language instances.
  • All NL forums will adopt the phpBB Prosilver template with minor tweaks to implement OOo branding. The Subsilver will not be supported.
  • “Supported Languages” at cut-over to the new hardware will be English (UK and US dialects), French, Spanish (Spain, Argentina, Catalan, Valencia dialects), Hungarian, Mandarin (Traditional and Simplified Scripts), Japanese, Dutch and Vietnamese.
  • All Supported Languages will be available to all NL forums. However it is up to the lead NL Administrators to decided which languages are enabled for their forum. This will normally be UK English and the local languages. The lead NL Administrator will also decide the default Language for the forum.
  • Each Language Forum will contain an administration sub-forum, access to which will be limited to trusted moderators and administrators.
  • We want to move to as "Zero Admin" an approach as possible. Therefore all routine system administration functions will be scripted by means of automated (usually crontab based) scripts.
  • A common code base will be shared by all instances by sym-linking the individual instance hierarchies to a common directory set. This will be based on a fresh install of phpBB from the phpBB Downloads reference site.
  • The Community Forums will retain a small number of functional changes as detailed in section Forum Customisations below. These changes are that subset of the changes applied to current individual Language Forums, filtered by criteria based on proven user benefit of the change compared to the incremental support effort,
  • Python will be used as the standard system administration scripting Language.

Code Base

Language
Language files
Prosilver GIFs
English (U.K.) (built-in) (built-in)
English (U.S.) lang_en_us.tar.gz prosilver_en_us.tar.gz
French lang_fr.tar.gz prosilver_fr.tar.gz
Catalan lang_ca.tar.gz prosilver_ca.tar.gz
Catalan (Valencian) lang_ca_pv.tar.gz prosilver_ca_pv.tar.gz
Spanish lang_es.tar.gz prosilver_es.tar.gz
Spanish (Argentina) lang_es_ar.tar.gz prosilver_es_ar.tar.gz
Hungarian lang_hu.tar.gz prosilver_hu.tar.gz
Mandarin Chinese (Traditional script) lang_zh_cmn_hant.tar.gz prosilver_zh_cmn_hant.tar.gz
Mandarin Chinese (Simplified script) lang_zh_cmn_hans.tar.gz prosilver_zh_cmn_hans.tar.gz
Japanese lang_ja.tar.gz prosilver_ja.tar.gz
Vietnamese (unsupported translation) (unsupported translation)
  • Note that the Vietnamese is not yet available through the phpBB site.

Overview of the Migration Philosophy

To analyse the differences in the current NL forum implementations, I have written some scripts to do a smart delta of the seven current language-specific directory hierarchies. Essentially these configurations are an exercise of 'entropy in action': every configuration is slightly different, both in terms of language packs and code modifications. This means than any upgrade of the baseline phpBB package has to address seven different versions. The consequences for long term maintenance are just a killer and this is going to get worse as we add support for extra languages, so we must move to a single common code base for all forums.

Previous to this migration I did a migration of the Virtualbox Community Forums from phpBB V2 to V3.0.4, and used our OOo configuration and customisations as a starting point for this target phpBB implementation. I reviewed all of the OOo customisations in terms of their value and volunteer feedback, and junked any that haven't had a proven benefit. This "junked" list includes my experimental kB patch; the heavy OOo customisation of the standard phpBB FAQ customisation; the beginners template; the email link to the forumadmin on all pages. I then regressed all remaining changes onto the standard phpBB V3.0.4 to derive a patch to implement the required changes. With the exception of the VirtualBox vs. OOo specific branding in the headers and footers, this work largely applies to our target configuration as well.

The main difference between the OOo and VirtualBox implementations is that VirtualBox has a single production forum, whereas we maintain a number of separate NL forums within OOo. Key to making a scalable implementation is the recognition that some data is forum content or (NL) forum-specific and some is intrinsically common across all NL forums. What we need to do is to establish a clear demarcation between this instance-specific and the common configuration. What complicates this is that the phpBB implementation uses both a database and a filestore based directory hierarchy. However, I have adopted some simple rules to help here:

  • Database stored content is always treated as instance-specific. Each NL instance has its own schema which is identified by it (lowercase) two letter identifier. This is identifier is also embedded into the phpbb_xx_ tablename prefix.
  • Where practical, file content is segregated so that individual directories are either instance-specific and the common, but not both.
  • The common directories are not duplicated but symlinked to a common base directory.
  • The common files are not duplicated but symlinked to a common base file.

One small side-effect is that a common set of language templates to be maintained across all NL forums, and hence all languages will be available to the individual NL Administrators (though these might only chose to enable a subset of languages within their own forum). One subtly is where configuration items are NL forum-specific rather than user language-specific; one example is that an advisory message is displayed in the post form asking users to read the Survival Guide before posting. We will have a Survival Guide in every forum, so this language-specific instruction can a direct translation of a common instruction, that this the wording can be common in meaning across every language and NL forum. Hence this text will be in the language template. However, the link to actual guide is forum specific and this will be maintained in the forum's system configuration table:

  • Language specifics will be maintained in the language template
  • Forum specifics will be maintained in the forum phpBB_XX_config table.

Directory and File Organisation

phpBB establishes its root with Apache is by examining the current directory of any script that is directly executed through a URI. These URI executed scripts are as follows:

  • forum: cron.php, faq.php, file.php, index.php, listforums.php, mcp.php, memberlist.php, posting.php, report.php, search.php, style.php, ucp.php, viewforum.php, viewonline.php, viewtopic.php
  • forum/adm:index.php, swatch.php
  • forum/download: file.php

Each forum will therefore have private copies of these directories (the forum root, adm, download) along with the other sub-directories which contain forum-specific files: cache, download, files, instance-images(+), store. The remaining subdirectories will be sym-linked to a common repository:

  • docs -> ../../common/forum/docs
  • language -> ../../common/forum/language
  • styles -> ../../common/forum/styles
  • includes -> ../../common/forum/includes
  • images -> ../../common/forum/images.

The individual files in the top level directory, etc., will be sym-linked to the common copy:

  • forum/config.php -> ../../common/forum/config.php(X)
  • forum/common.php -> ../../common/forum/common.php
  • forum/cron.php -> ../../common/forum/cron.php
  • forum/faq.php -> ../../common/forum/faq.php
  • forum/file.php -> ../../common/forum/file.phpx`
  • forum/index.php -> ../../common/forum/index.php
  • forum/listforums.php -> ../../common/forum/listforums.php
  • forum/mcp.php -> ../../common/forum/mcp.php
  • forum/memberlist.php -> ../../common/forum/memberlist.php
  • forum/posting.php -> ../../common/forum/posting.php
  • forum/report.php -> ../../common/forum/report.php
  • forum/search.php -> ../../common/forum/search.php

(+) In general the subdirectories are cleanly split into holding shared or instance-specific data. The one exception which is a mess is the images subdirectory. This contains shared content such as the avatar image galleries and instance-specific data such as uploaded avatars. However, the locations of this instance-specific data is configurable and I have therefore tidied this up by moving these instance-specific data to a separate instance-images subdirectory, so that images can be shared:

Config item
Type
Current Default
New Values
avatar_path Specific images/avatars/upload avatars-upload
avatar_gallery_path Shared images/avatars/gallery images/avatars/gallery
upload_icons_path Shared images/upload_icons images/upload_icons
ranks_path Shared images/ranks images/ranks

(X) The config.php file has been modified to pick up the NL designator from the current path, enabling the same config.php to be used for all NL instances.

This may all sound complex and inefficient, but it actually turns out to be better that the current policy of private instance copies of code and templates. Where the individual instance symlink to common code, this is understood by the php5 engine's code caching mechanism, so code caching is more effective as the code is shared across NL instances. In the case where (as in the case of the template based code) code is dynamically modified to reflect instance-specific configuration, in this case the phpBB caching system means that the material overhead of this redirection where the code is is practically removed.

Database Organisation

Hence the software configuration of the new NL foruMS will be a clean patched install. Only the forum content will be migrated. This content is largely contained in a set of NL forum-specific database tables, though it also comprises data in the instance-specific of directories discussed in the previous section.

We do have a dilemma in that setting up the exact configuration of each NL-specific configuration is somewhat iterative will involve some work and tweaking. However, whilst we are doing this our end-user communities will be continuing to use and to add content. Hence not only do I need to move the individual NL-specific database content from a phpBB 3.0.1 / PostgreSQL 8.2 format to a phpBB 3.0.4 / MySQL 5.0 one, I also need to accommodate a two stage migration where in the second stage I can remigrate the user and post content which has changed in the prototyping time-window, which a call "Live Resynchronisation".

I have written a small shell script (as this runs on the old hardware which doesn't have Python installed) which can dump either all tables relating to a specific language forum or the subset of tables that contain the user and post data. I also have written a second python script which converts this PostgreSQL V3.0.1 schema and dump into a MySQL V3.0.4 compatible load.

Live Resynchronisation exploits the natural structure of the phpBB data model, and specifically that the phpBB tables can be effectively grouped into the following two categories. This enables the overlay of the current user and post data onto a previous migration:

  1. Tables which define the forum configuration. These largely define the structure and configuration forum, and are only changed when this configuration is changed. These tables can be migrated and refined during the pilot runs to finalise the new database layout. As long as the configuration is frozen during the migration period, e.g. no changes to the forum hierarchy, these tables don’t need to be updated. The one exception here is the forums table which contains some denormalised data elements (e.g. the number of topic in the forum) which need special treatment as discussed below.
    acl_groups, acl_options, acl_roles, acl_roles_data, bbcodes, bots, config, disallow, extension_groups, extensions, forums, forums_access, groups, icons, knowledge_base, lang, moderator_cache, modules, profile_fields, profile_fields_lang, profile_lang, ranks, search_wordlist, search_wordmatch, sitelist, smilies, styles, styles_imageset, styles_imageset_data, styles_template, styles_template_data, styles_theme, words, zebra
  2. Tables which contain post and user related data. These contain the user content that evolve over time as users register and post to topics. These need to be migrated again in the second phase and then copied over the target database.
    acl_users, attachments, banlist, bookmarks, confirm, drafts, forums_track, forums_watch, log, poll_options, poll_votes, posts, privmsgs, privmsgs_folder, privmsgs_rules, privmsgs_to, profile_fields_data, reports, reports_reasons, search_results, sessions, sessions_keys, topics, topics_posted, topics_track, topics_watch, user_group, users, warnings

By splitting the tables this way I can migrate the database, then we can take some days getting it configured correctly whilst the live (phpBB v3.0.1) database continues to be updated. I can then remigrate the post and user related data, so that you can tweak the rest of the configuration during testing in parallel to live use of the database. As Live Resynchronisation does not mirror groups, role based access to forums, etc., these must be be frozen over this short period, or any necessary changes will need to be manually resynchronised.

On a first load the full schema is loaded and all tables. On resynchronisation of post and user related data, only the tables in the second group are truncated and reinserted. Also note that on resynchronisation we need to copy the contents of the files and images/avatars/upload folders that have been created or updated since the initial backup.

The forum table is denormalised to include information on total topic counts, details on the last post, etc. The merge script include SQL updates to update these fields from the topics table.

Migration Process

The migration will take place in three phases:

  • The first involves building a complete forum set in a virtual LAMP appliance. It's just easier for me to do this on a LAN connected VM than on the target Solaris hardwar and zone in the Sun Labs in Germany over the internet. Then each of the current language databases will be snapped as a SQL backup into the forum backups directory, and then wget pulled to the LAMP appliance for loading and configuring into the test system. I will test and tune this LAMP configuration to validate the basic configuration.
  • Once I have tested this configuration, I will then transfer it to the OOoTest system in Germany. Prior to the transfer I will truncate the tables in group (1) before exporting the database. I then dump the group (1) tables and refresh the test database to resync it to live. At this point the individual language forum leaders will be given Administrator access to each forum to validate functionality and configuration.
  • Once the Administrators have confirmed the system configuration of their respective forums, we are then ready to go live. The cut-over itself will be based on a process that I developed and tested in the VirtualBox migration. This is to bring the live forums offline, then backup the current user and post data and then overlay this onto the target OOoTest server. This involves truncating and replacing the content of the following tables from live. Once this import is complete the forums are available for go-live. This cut-over process actually involves a small down time (roughly 30 mins), but in practice this will be dominated by the DNS reconfiguration times to remap user.services.openoffice.org to the new hardware.

Forum Customisation

The delivered customised instances contains the following changes:

  • Reorder window title to make multi-tab title more usable. All browser tabs are labelled with their title. The phpBB3 default is to begin the title with the forum name then page category, hence if you have a lot of tabs open they are called openOffi... By putting the specific title first, it makes it a lot easier to pick the correct tab.
  • Add Expand option to Code Block. phpBB3 now puts each code block in a scrollable window. This makes it a pain if you want to see all of the code in the browser. This mod adds an Expand option to the code block to allow you to blow it up to full size.
  • Tweak Access key codes to remove conflicts. For those that use keyboard accelerators within the browser, the lack of a shortcut for Preview Post is a pain. This makes <Accel>+P a shortcut for Preview and <Accel>+G a shortcut for graphics.
  • Add Active Post count to View Active Topics. This just adds a count of new posts in the Active topics heading.
  • Add truncated title of last topic to Forum View. Lots of OOo volunteers asked for this one. It enables you to recognise at a glance if a forum has any new posts in it.
  • Customise Headers and Footers to OpenOffice.org branding. Add OOo Community logo and adjust framing colours to match. Add OOo favicon.
  • Rebasline to phpBB 3.0.4. The install is a patch file which updates the standard 3.0.4 unpacked tarball.
  • Add OpenOffice.org Qs to BB FAQ. The FAQ button goes to a phpBB3 standard FAQ which is all about using phpBB3. I've added a Q0 which points the user to the wiki and forum FAQs for openOffice.org info.
  • Add instruction to “read Survival Guide before posting” to post form. The posting form now includes an instruction to read the survival guide before you post. (This is only shown if you have less than 15 posts).
  • Customisations that are configurable through phpBB. These don't need code or style changes:
    • Add custom fields to define users, main OS, installed OOo version and supplementary OSs.
    • Add additional BBcodes for [edit], [pre], [strike], [center], [sub], [super], [hr]
Personal tools