User:TerryE/phpBB3.0.4 Migration/Standardisation Strategy

From Apache OpenOffice Wiki
< User:TerryE
Revision as of 17:29, 23 January 2010 by TerryE (Talk | contribs)

Jump to: navigation, search

OOo Community Forums Migration Strategy

Activities on this page are now closed

The current openOffice.org (“OOo”) forums (“Community Forums”) started with the creation of the English forum in Nov 2007 running on phpBB 3.0.0 over a Sun Coolstack platform and PostgreSQL 8.1 on Solaris x86 infrastructure in the Sun Munich Facility. We subsequently upgraded to phpBB 3.0.1 and PostgreSQL 8.2 and then added the Hungarian, French, Spanish and Japanese forums. We are now in the process of adding the Vietnamese and Chinese forums. Given the proliferation of forums, the general growth in transactional volumes and the fact that phpBB is now at version 3.0.4, we have decided to upgrade the H/W infrastructure and install a clean re-baseline for all forums. This wiki page describes the process and approach for migration the OOo forums from their current platform to the new installation.

Assumptions and Constraints

  • The Community Forums are running on a number of separate phpBB instances all at version phpBB 3.0.1 (but with disjoint code hierarchies) over a common PostgreSQL 8.2 database on a Solaris/Coolstack platform.
  • The content of the Community Forums is largely maintained in the central database, though attachment and image data are maintained in the file separate file hierarchies. All such content will be preserved on migration.
  • A new hardware platform will be running Solaris 10, plus Coolstack 1.3.1 including the optional CSKPython package. All instances will use the 32bit CSK packaged MySQL 5.0.
  • Each instance of the Community Forums implements a language-specific forum (an “NL Forum”). At cut-over to the new hardware, these will comprise the following NL Forums: English (en), Spanish (es), French (fr), Hungarian (hu), Japanese (ja), Dutch (nl), Vietnamese (vi) and Chinese (zh).
  • Each NL Forum will have a designated lead “NL Administrator” who is responsible for final acceptance and configuration of the forum as follows:
  • Each NL Forum will have a designated alternate NL Administrator who can act as a deputy to the lead.
  • The target configuration of the Community Forums will be a largely “out of the box” configuration of phpBB version 3.0.4 designed to facilitate the easy addition of extra language instances.
  • All NL Forums will adopt the phpBB Prosilver template with minor tweaks to implement OOo branding. The Subsilver will not be supported.
  • “Supported Languages” at cut-over to the new hardware will be English (UK and US dialects), French, Spanish (Spain, Argentina, Catalan, Valencia dialects), Hungarian, Mandarin (Traditional and Simplified Scripts), Japanese, Dutch and Vietnamese.
  • All Supported Languages will be available to all NL Forums. However it is up to the lead NL Administrators to decided which languages are enabled for their forum. This will normally be UK English and the local languages. The lead NL Administrator will also decide the default Language for the forum.
  • Each NL Forum will contain an administration sub-forum, access to which will be limited to trusted moderators and administrators.
  • We want to move to as "Zero Admin" an approach as possible. Therefore all routine system administration functions will be scripted by means of automated (usually crontab based) scripts.
  • A common code base will be shared by all instances by sym-linking the individual instance hierarchies to a common directory set. This will be based on a fresh install of phpBB from the phpBB Downloads reference site.
  • The Community Forums will retain a small number of functional changes as detailed in section Forum Customisations below. These changes are that subset of the changes applied to current individual NL Forums, filtered by criteria based on proven user benefit of the change compared to the incremental support effort,
  • Python will be used as the standard system administration scripting Language.

Code Base

Language
Language files
Prosilver GIFs
English (U.K.) (built-in) (built-in)
English (U.S.) lang_en_us.tar.gz prosilver_en_us.tar.gz
French lang_fr.tar.gz prosilver_fr.tar.gz
Catalan lang_ca.tar.gz prosilver_ca.tar.gz
Catalan (Valencian) lang_ca_pv.tar.gz prosilver_ca_pv.tar.gz
Spanish lang_es.tar.gz prosilver_es.tar.gz
Spanish (Argentina) lang_es_ar.tar.gz prosilver_es_ar.tar.gz
Dutch lang_nl.tar.gz prosilver_nl.tar.gz
Hungarian lang_hu.tar.gz prosilver_hu.tar.gz
Mandarin Chinese (Traditional script) lang_zh_cmn_hant.tar.gz prosilver_zh_cmn_hant.tar.gz
Mandarin Chinese (Simplified script) lang_zh_cmn_hans.tar.gz prosilver_zh_cmn_hans.tar.gz
Japanese lang_ja.tar.gz prosilver_ja.tar.gz
Vietnamese (unsupported translation) (unsupported translation)
  • Note that the Vietnamese is not yet available through the phpBB site.

Overview of the Migration Approach

I have written some scripts to analyse the differences in the current NL Forum implementations, by doing a smart delta of the seven current language-specific directory hierarchies. These configurations have proven to be a case of 'entropy in action' as every configuration is slightly different, both in terms of language packs and code modifications. This means than any upgrade of the baseline phpBB package has to address seven different versions. The consequences for long term maintenance are just a killer and this is going to get worse as we add support for extra languages, so I will be moving to a single common code base for all forums.

Previous to this migration I did a migration of the Virtualbox Community Forums from phpBB V2 to V3.0.4, and used our OOo configuration and customisations as a starting point for this target phpBB implementation. I reviewed all of the OOo customisations in terms of their value and volunteer feedback, and junked any that haven't had a proven benefit. This "junked" list includes my experimental kB patch; the heavy OOo customisation of the standard phpBB FAQ customisation; the beginners template; the email link to the forumadmin on all pages. I then regressed all remaining changes onto the standard phpBB V3.0.4 to derive a patch to implement the required changes. With the exception of the VirtualBox vs. OOo-specific branding in the headers and footers, this work largely applies to our target configuration as well.

The main difference between the OOo and VirtualBox implementations is that VirtualBox has a single production forum, whereas we maintain a number of separate NL Forums within OOo. I want to ensure that we maintain a scalable implementation for our NL Forums. Whilst the forum content and some aspects of forums configuration are NL specific, the code base and much of the forum configuration is intrinsically common across all NL Forums. (The reasons for these difference are not only content and posting language, but also the size of the NL community. For example, the English community has a relatively high volume of postings so this forum can support a rich forum structure, whilst the Hungarian community is smaller and the forum structure needs to be aligned to this.

Key to achieving this scalabalility is establishing a clear demarcation between the instance-specific content and the common configuration. What complicates this is that the phpBB implementation uses both a database and a filestore based directory hierarchy. However, I have adopted some simple rules to help here:

  • Database stored content is always treated as instance-specific. Each NL instance has its own schema which is identified by its (lowercase) two letter identifier. This is identifier is also embedded into the phpbb_xx_ tablename prefix.
  • Where practical, file content is segregated so that individual directories are either instance-specific and the common, but not both.
  • The common directories are not duplicated but symlinked to a common base directory.
  • The common files are not duplicated but symlinked to a common base file.

One small side-effect is that a common set of language templates to be maintained across all NL Forums, and hence all languages will be available to the individual NL Administrators (though these might only chose to enable a subset of languages within their own forum). One subtly is where configuration items are NL Forum-specific rather than user language-specific; one example is that an advisory message is displayed in the post form asking users to read the Survival Guide before posting. We will have a Survival Guide in every forum, so this language-specific instruction can a direct translation of a common instruction, that this the wording can be common in meaning across every language and NL Forum. Hence this text will be in the language template. However, the link to actual guide is forum specific and this will be maintained in the forum's system configuration table:

  • Language specifics will be maintained in the language template
  • Forum specifics will be maintained in the forum phpBB_XX_config table.

Directory and File Organisation

PhpBB establishes its root with Apache is by examining the current directory of any script that is directly executed through a URI. These URI executed scripts are as follows:

  • forum: cron.php, faq.php, file.php, index.php, listforums.php, mcp.php, memberlist.php, posting.php, report.php, search.php, style.php, ucp.php, viewforum.php, viewonline.php, viewtopic.php
  • forum/adm:index.php, swatch.php
  • forum/download: file.php

Each forum will therefore have private copies of these directories (the forum root, adm, download) along with the other sub-directories which contain forum-specific files: cache, download, files, avatars-upload and store. The remaining subdirectories docs, language, styles, includes and images are sym-linked to the corresponding directory in the common repository (e.g. images -> ../../common/forum/images). In general the subdirectories are cleanly split into holding shared or instance-specific data. The one exception which is a mess is the images subdirectory. This contains shared content such as the avatar image galleries and instance-specific data such as uploaded avatars. However, the locations of this instance-specific data is configurable and I have therefore tidied this up by moving these instance-specific data to a separate subdirectory, so that images can be shared:

Config item
Type
Current Default
New Values
avatar_path Specific images/avatars/upload avatars-upload
avatar_gallery_path Shared images/avatars/gallery images/avatars/gallery
upload_icons_path Shared images/upload_icons images/upload_icons
ranks_path Shared images/ranks images/ranks

For the upgrade what I've done is to manually reconcile the various NL directory hierarchies in the existing installation to ensure that the new phpBB-common contains all necessary common files to support all forums. There is also a blank instance/forum which is used as a template; each NL installation is a replication of this hierarchy, creating the en ...zh directory trees. The files and avatar-upload directories are copied from the legacy NL installation to complete the setup.

Note that where individual php files must be located in a specific directory so that the script can acquire the correct working directory root, these are also sym-linked to the corresponding common copy (e.g. forum/config.php -> ../../common/forum/config.php). The config.php file has been modified to pick up the NL designator from the current path, enabling the same config.php to be used for all NL instances.

For a detailed listing see Standard NL configuration in the detailed implementation notes.

This may all sound complex and inefficient, but it actually turns out to be better that the current policy of private instance copies of code and templates. The php5 engine's code caching mechanism understands where the individual instance is symlinked to common code, so code caching is more effective as the code is shared across NL Forum instances. Where code is dynamically modified to reflect instance-specific configuration (as in the case of the template based code), the phpBB caching system means that the material overhead of this redirection where the code is also largely removed.

Database Organisation

For various reasons we have decided to move the forums from the existing PosgreSQL 8.2 database to the standard MySQL 5.0 engine included in the Solaris Coolstack release. Hence the schema for the old and new NL instances are incompatible at a detailed DDL level. The DDL for the new NL Forums will therefore be based on a clean phpBB 3.0.4 install. Only the forum data content will be migrated. This content is primarily contained in a set of NL Forum-specific database tables (and the data in the instance-specific of directories as discussed in the previous section). Because of this the standard phpBB upgrade scripts will not work unmodified. I have therefore developed a migration process based on the following steps:

  • The content of each NL database is dumped to SQL.gz file within the temporary Apache directory /backups/migration. I have written a small shell script to do this (as the old system doesn't have Python installed).
  • These dump files are then pulled onto the new system using wget.
  • A custom python script is then used to convert this PostgreSQL formatted phpBB V3.0.1 dump into a MySQL 5.0 / phpBB V3.0.4 compatible load set. These converted dump can then be loaded into empty MySQL php V3.0.4 schemas.
  • Note that we have an historical issue that the set-up of each NL-specific configuration was slightly different. The conversion utlity therefore contains some limited exception rules to handle these cases (basically developed by debugging dry-run coversions).
  • Also during the various releases from phpBB V3.0.1 through V3.0.4, the software upgrades required small changes to the data content (typically adding some extra parameters to the phpbb_xx_config tables). I have reviewed these changes, both the variations in NL Forum configuration and the upgrade additions. For the specific case of these phpbb_xx_config tables, I have passed a summary of this review to the NL Forum Administrators, to ensure that all NL instances are properly reviewed, synchronised and consistent with any phpBB V3.0.4 application assumptions. For the remaining change I have patched the standard phpBB database_upgrade.php utility.
  • However, this all takes time and our end-user communities will be continuing to use and to add content to the forums whilst we are doing this. We cannot afford to bring the forums off-line during this review period. Hence I have implemented a two stage migration where I do most of the php changes and configuration changes in a first stage, where I can re-migrate the user and post content to pick up any changes in this time-window in the second stage. I call this second stage "Live Resynchronisation" and discuss it in further detail in the remainder of this section. In essence this involves an export-convert-data load which is a variant of the above steps but for a subset of tables.

Live Resynchronisation exploits the natural structure of the phpBB data model, and specifically the phpBB schema tables can be effectively grouped into two categories: those which define the forum configuration; and those which relate to user and post data. However, I have had to rethink this approach as most NL Administrators have been unwilling to do the detailed review and preparation on the early migrated system, and so I have had fall back to doing most of the preparatory changes on live. This has meant that the boundary between the two phases has shifted with nearly all of the structure and content tables needing to be reimported at Live Migration.

  1. Tables which define the forum configuration. These largely define the structure and configuration forum, and are only changed when this configuration is changed. These tables can be migrated and refined during the pilot runs to finalise the new database layout. As long as the configuration is frozen during the migration period, e.g. no changes to the forum hierarchy, these tables don’t need to be updated. The one exception here is the forums table which contains some denormalised data elements (e.g. the number of topic in the forum) which need special treatment as discussed below.
    bbcodes, bots, config, extension_groups, extensions, icons, lang, modules, ranks, search_wordlist, search_wordmatch, smilies, styles, styles_imageset, styles_imageset_data, styles_template, styles_template_data, styles_theme, words, zebra
  2. Tables which contain post and user related data. These contain the user content that evolves over time as users register and post to topics. These need to be migrated again in the second phase and then copied over the target database.
    acl_users, attachments, banlist, bookmarks, confirm, drafts, forums_track, forums_watch, log, poll_options, poll_votes, posts, privmsgs, privmsgs_folder, privmsgs_rules, privmsgs_to, profile_fields_data, reports, reports_reasons, search_results, sessions, sessions_keys, topics, topics_posted, topics_track, topics_watch, user_group, users, warnings
    (Added following change in split) acl_groups, acl_options, acl_roles, acl_roles_data, disallow, forums, forums_access, groups, moderator_cache, profile_fields, profile_fields_lang, profile_lang, sitelist

By splitting the tables this way I can migrate the database, and we can then take some days getting it configured correctly, whilst the community continues to use the live (phpBB v3.0.1) NL Forums. I have now dress-rehearsed the remigration the post and user related data, a couple of times and so that we can refine the rest of the configuration during testing in parallel to live use of the database and have soome confidence in the migration and its timing. As Live Resynchronisation does not mirror groups, role based access to forums, etc., these must be be frozen over this short period, or any necessary changes will need to be manually resynchronised.

On a first load the full schema is loaded and all tables. On resynchronisation of post and user related data, only the tables in the second group are truncated and reinserted. Also note that I copy the contents of the files and images/avatars/upload folders that have been created or updated since the initial backup on resynchronisation.

The forum table is denormalised to include information on total topic counts, details on the last post, etc. The merge script includes SQL updates to update these fields from the topics table.

The one really messy bit of this approach is that the standard canned phpBB upgrade script not only updates the schema, but also caries out a set up updates to the data content to reflect any changes in functionality implemented in the upgrade. This is scripted in function change_database_data in module phpBB-common/install/database_update.php. I couldn't use this "out of the box" so I have patched it to disable the schema changes (since these are not needed because the target systems start with a phpbBB 3.0.4 MySQL schema baseline.

Migration Process

The migration takes place in three phases, the first two of which have now been completed and I have also rehearse the third a couple of times.in its development:

  • The first involves building a complete forum set in a virtual LAMP appliance. It's just easier for me to do this on a LAN connected VM than on the target Solaris hardwar and zone in the Sun Labs in Germany over the internet. Then each of the current language databases will be snapped as a SQL backup into the forum backups directory, and then wget pulled to the LAMP appliance for loading and configuring into the test system. I have tested and tuned this LAMP configuration to validate the basic configuration.
  • I testing on this configuration was completed, I then then transfered it to the OOoTest system in Germany. Prior to the transfer I truncated the tables in group (1) before exporting the database. I then dumped the group (1) tables and refresh the test database to resync it to live. At this point the individual NL Forum leaders were given Administrator access to each forum to validate functionality and configuration.
  • Once the Administrators have confirmed the system configuration of their respective forums, we are then ready to go live. The cut-over itself will be based on a process that I developed and tested in the VirtualBox migration. This is to bring the live forums offline, then backup the current user and post data and then overlay this onto the target OOoTest server. This involves truncating and replacing the content of the following tables from live. Once this import is complete the forums are available for go-live. This cut-over process actually involves a small down time (roughly 30 mins), but in practice this will be dominated by the DNS reconfiguration times to remap user.services.openoffice.org to the new hardware.

I have included some phpBB3.0.4 Migration — Detailed Implementation Notes in a separate web page.

Forum Customisation

The delivered customised instances contains the following changes:

  • Reorder window title to make multi-tab title more usable. All browser tabs are labelled with their title. The phpBB3 default is to begin the title with the forum name then page category, hence if you have a lot of tabs open they are called openOffi... By putting the specific title first, it makes it a lot easier to pick the correct tab.
  • Add Expand option to Code Block. phpBB3 now puts each code block in a scrollable window. This makes it a pain if you want to see all of the code in the browser. This mod adds an Expand option to the code block to allow you to blow it up to full size.
  • Tweak Access key codes to remove conflicts. For those that use keyboard accelerators within the browser, the lack of a shortcut for Preview Post is a pain. This makes <Accel>+P a shortcut for Preview and <Accel>+G a shortcut for graphics.
  • Add Active Post count to View Active Topics. This just adds a count of new posts in the Active topics heading.
  • Add truncated title of last topic to Forum View. Lots of OOo volunteers asked for this one. It enables you to recognise at a glance if a forum has any new posts in it.
  • Customise Headers and Footers to OpenOffice.org branding. Add OOo Community logo and adjust framing colours to match. Add OOo favicon.
  • Rebasline to phpBB 3.0.4. The install is a patch file which updates the standard 3.0.4 unpacked tarball.
  • Add OpenOffice.org Qs to BB FAQ. The FAQ button goes to a phpBB3 standard FAQ which is all about using phpBB3. I've added a Q0 which points the user to the wiki and forum FAQs for openOffice.org info.
  • Add instruction to “read Survival Guide before posting” to post form. The posting form now includes an instruction to read the survival guide before you post. (This is only shown if you have less than 15 posts).
  • Optional Mandatory Subject on Registration. This is a customisation used in the French forum, so I have added this to avoid having to withdraw existing functionality.
  • Customisations that are configurable through phpBB. These don't need code or style changes:
    • Add custom fields to define users, main OS, installed OOo version and supplementary OSs.
    • Add additional BBcodes for [edit], [pre], [strike], [center], [sub], [super], [hr]

All these changes where developed and tested on a VM sandpit and a patch file was used to generate the delta to apply to the target system.

Cut-Over

The cut-over took place on Mon 11th May, with the service being offline from 13:00-14:00 CET and in this period we migrated and upgraded all forums. There was one matieral problem that occurred. Prior to cutover I made some changes to the mysql.cnf file to bring it inline with the standard large database model. The migration uility batched the inserts into blocks of 1,000 rows and one of the settings proved be on a failure cusp resulting in one insert block out of the hundreds executed failing with an

   ERROR 1153 (08S01) at line 112886: Got a packet bigger than 'max_allowed_packet' bytes

so a block of a thousand posts got lost from the FR forum. (Yes I should have had a batch totals check, but the first time this happened was in the live cut-over.) We was noticed this mismatch an hour or so after go live, but by this point I had already run a resynchronise counts utility which cleans up some of the tracking data associated with posts. Restoring the dropped posts from the old D/B was easy. Recovering the tracking data is a lot more complex.

Main article: Follow-up Issues and Actions

I have a bunch of minor issue and follow up that now need to sorted post go live so I have added an extra Issues and Actions page to track these. I regret that we've had to leave so many of these to post go live, but the realities are that no matter how much your prepare and rehearse and ask people to review and test, such is human nature that people don't give you this feedback until they see the problems on the live system.

Personal tools