Difference between revisions of "User:TerryE/phpBB3.0.4 Migration/Standardisation Strategy"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Overview of the Migration Philosophy)
(Overview of the Migration Philosophy)
Line 72: Line 72:
 
Previous to this migration I did a migration of the [http://forums.virtualbox.org/ Virtualbox Community Forums] from phpBB V2 to V3.0.4, and used our OOo configuration and customisations as a starting point for this target phpBB implementation.  I reviewed all of the OOo customisations in terms of their value and volunteer feedback, and junked any that haven't had a proven benefit.  This "junked" list includes my experimental kB patch; the heavy OOo customisation of the standard phpBB FAQ customisation; the beginners template; the email link to the '''forumadmin''' on all pages. I then regressed all remaining changes onto the standard phpBB V3.0.4 to derive a patch to implement the required changes.  With the exception of the VirtualBox vs. OOo specific branding in the headers and footers, this work largely applies to our target configuration as well.  
 
Previous to this migration I did a migration of the [http://forums.virtualbox.org/ Virtualbox Community Forums] from phpBB V2 to V3.0.4, and used our OOo configuration and customisations as a starting point for this target phpBB implementation.  I reviewed all of the OOo customisations in terms of their value and volunteer feedback, and junked any that haven't had a proven benefit.  This "junked" list includes my experimental kB patch; the heavy OOo customisation of the standard phpBB FAQ customisation; the beginners template; the email link to the '''forumadmin''' on all pages. I then regressed all remaining changes onto the standard phpBB V3.0.4 to derive a patch to implement the required changes.  With the exception of the VirtualBox vs. OOo specific branding in the headers and footers, this work largely applies to our target configuration as well.  
  
The main difference between the OOo and VirtualBox implementations is that VirtualBox has a single production forum, whereas we maintain a number of separate NL forums within OOo.  Nonetheless, the base software configuration will be shared by all forums and the configuration set ups will be largely consistent across all forums.  This will enable a common set of language templates to be maintained across all forums, and hence all languages will be available to the individual language forum Administrators (though these Administrators might only chose to enable a subset of languages within their own forum).  Certain configuration items are forum-specific rather than language-specific.  An example of this is that an advisory message is displayed in the post form asking users to read the Survival Guide before posting.  We will have a Survival Guide in every forum, so the language-specific instruction can a direct translation of a common instruction, that this the wording can be equivalent in content for every language and forum, so this text will be in the language template.  However, the link to actual guide is forum specific and this will be maintained in the forum's system configuration table:  
+
The main difference between the OOo and VirtualBox implementations is that VirtualBox has a single production forum, whereas we maintain a number of separate NL forums within OOo.  Key to making a scalable implementation is the recognition that some data is forum content or (NL) forum-specific and some is intrinsically common across all NL forums.  What we need to do is to establish a clear demarcation between this instance-specific and the common configuration.  What complicates this is that the phpBB implementation uses both a database and a filestore based directory hierarchy.  However, I have adopted some simple rules to help here:
 +
 
 +
* Database stored content is always treated as instance-specific.  Each NL instance has its own schema which is identified by it (lowercase) two letter identifier.  This is identifier is also embedded into the '''phpbb_''xx''_''' tablename prefix.
 +
* Where practical, file content is segregated so that individual directories are either instance-specific and the common, but not both. 
 +
* The common directories are not duplicated but symlinked to a common base directory.
 +
* The common files are not duplicated but symlinked to a common base file.
 +
 
 +
 
 +
Nonetheless, the base software configuration will be shared by all forums and the configuration set ups will be largely consistent across all forums.  This will enable a common set of language templates to be maintained across all forums, and hence all languages will be available to the individual language forum Administrators (though these Administrators might only chose to enable a subset of languages within their own forum).  Certain configuration items are forum-specific rather than language-specific.  An example of this is that an advisory message is displayed in the post form asking users to read the Survival Guide before posting.  We will have a Survival Guide in every forum, so the language-specific instruction can a direct translation of a common instruction, that this the wording can be equivalent in content for every language and forum, so this text will be in the language template.  However, the link to actual guide is forum specific and this will be maintained in the forum's system configuration table:  
 
* Language specifics will be maintained in the language template  
 
* Language specifics will be maintained in the language template  
 
* Forum specifics will be maintained in the forum '''phpBB_''XX''_config''' table.  
 
* Forum specifics will be maintained in the forum '''phpBB_''XX''_config''' table.  

Revision as of 07:10, 6 April 2009

OOo Community Forums Migration Strategy

The current openOffice.org (“OOo”) forums (“Community Forums”) started with the creation of the English forum in Nov 2007 running on phpBB 3.0.0 over a Sun Coolstack platform and PostgreSQL 8.1 running on Solaris x86 infrastructure in the Sun Munich Facility. We subsequently upgraded to phpBB 3.0.1 and PostgreSQL 8.2 and then added the Hungarian, French, Spanish and Japanese forums. We are now in the process of adding the Vietnamese and Chinese forums. Given the proliferation of forums, the general growth in transactional volumes and the fact that phpBB is now at version 3.0.4, we have decided to upgrade the H/W infrastructure and install a clean re-baseline for all forums. This wiki page describes the process and approach for migration the OOo forums from their current platform to the new installation.

Assumptions and Constraints

  • The Community Forums are running on a number of separate phpBB instances all at version phpBB 3.0.1 (but with disjoint code hierarchies) over a common PostgreSQL 8.2 database on a Solaris/Coolstack platform.
  • The content of the Community Forums is largely maintained in the central database, though attachment and image data are maintained in the file separate file hierarchies. All such content will be preserved on migration.
  • A new hardware platform will be running Solaris 10, plus Coolstack 1.3.1 including the optional CSKPython package. All instances will use the 32bit CSK packaged MySQL 5.0.
  • Each instance of the Community Forums implements a language-specific forum (an “NL forum”). At cut-over to the new hardware, these will comprise the following NL forums: English (en), Spanish (es), French (fr), Hungarian (hu), Japanese (ja), Vietnamese (vi) and Chinese (zh).
  • Each NL forum will have a designated lead “NL Administrator” who is responsible for final acceptance and configuration of the forum as follows:
  • Each NL forum will have a designated alternate NL Administrator who can act as a deputy to the lead.
  • The target Community Forums configuration will be a largely “out of the box” configuration of phpBB version 3.0.4 designed to facilitate the easy addition of extra language instances.
  • All NL forums will adopt the phpBB Prosilver template with minor tweaks to implement OOo branding. The Subsilver will not be supported.
  • “Supported Languages” at cut-over to the new hardware will be English (UK and US dialects), French, Spanish (Spain, Argentina, Catalan, Valencia dialects), Hungarian, Mandarin (Traditional and Simplified Scripts), Japanese and Vietnamese.
  • All Supported Languages will be available to all NL forums. However it is up to the lead NL Administrators to decided which languages are enabled for their forum. This will normally be UK English and the local languages. The lead NL Administrator will also decide the default Language for the forum.
  • Each Language Forum will contain an administration sub-forum, access to which will be limited to trusted moderators and administrators.
  • We want to move to as "Zero Admin" an approach as possible. Therefore all routine system administration functions will be scripted by means of automated (usually crontab based) scripts.
  • A common code base will be shared by all instances by sym-linking the individual instance hierarchies to a common directory set. This will be based on a fresh install of phpBB from the phpBB Downloads reference site.
  • The Community Forums will retain a small number of functional changes as detailed in section Forum Customisations below. These changes are that subset of the changes applied to current individual Language Forums, filtered by criteria based on proven user benefit of the change compared to the incremental support effort,
  • Python will be used as the standard system administration scripting Language.

Code Base

Language
Language files
Prosilver GIFs
English (U.K.) (built-in) (built-in)
English (U.S.) lang_en_us.tar.gz prosilver_en_us.tar.gz
French lang_fr.tar.gz prosilver_fr.tar.gz
Catalan lang_ca.tar.gz prosilver_ca.tar.gz
Catalan (Valencian) lang_ca_pv.tar.gz prosilver_ca_pv.tar.gz
Spanish lang_es.tar.gz prosilver_es.tar.gz
Spanish (Argentina) lang_es_ar.tar.gz prosilver_es_ar.tar.gz
Hungarian lang_hu.tar.gz prosilver_hu.tar.gz
Mandarin Chinese (Traditional script) lang_zh_cmn_hant.tar.gz prosilver_zh_cmn_hant.tar.gz
Mandarin Chinese (Simplified script) lang_zh_cmn_hans.tar.gz prosilver_zh_cmn_hans.tar.gz
Japanese lang_ja.tar.gz prosilver_ja.tar.gz
Vietnamese (unsupported translation) (unsupported translation)
  • Note that the Vietnamese is not yet available through the phpBB site.

Overview of the Migration Philosophy

To analyse the differences in the current NL forum implementations, I have written some scripts to do a smart delta of the seven current language-specific directory hierarchies. Essentially these configurations are an exercise of 'entropy in action': every configuration is slightly different, both in terms of language packs and code modifications. This means than any upgrade of the baseline phpBB package has to address seven different versions. The consequences for long term maintenance are just a killer and this is going to get worse as we add support for extra languages, so we must move to a single common code base for all forums.

Previous to this migration I did a migration of the Virtualbox Community Forums from phpBB V2 to V3.0.4, and used our OOo configuration and customisations as a starting point for this target phpBB implementation. I reviewed all of the OOo customisations in terms of their value and volunteer feedback, and junked any that haven't had a proven benefit. This "junked" list includes my experimental kB patch; the heavy OOo customisation of the standard phpBB FAQ customisation; the beginners template; the email link to the forumadmin on all pages. I then regressed all remaining changes onto the standard phpBB V3.0.4 to derive a patch to implement the required changes. With the exception of the VirtualBox vs. OOo specific branding in the headers and footers, this work largely applies to our target configuration as well.

The main difference between the OOo and VirtualBox implementations is that VirtualBox has a single production forum, whereas we maintain a number of separate NL forums within OOo. Key to making a scalable implementation is the recognition that some data is forum content or (NL) forum-specific and some is intrinsically common across all NL forums. What we need to do is to establish a clear demarcation between this instance-specific and the common configuration. What complicates this is that the phpBB implementation uses both a database and a filestore based directory hierarchy. However, I have adopted some simple rules to help here:

  • Database stored content is always treated as instance-specific. Each NL instance has its own schema which is identified by it (lowercase) two letter identifier. This is identifier is also embedded into the phpbb_xx_ tablename prefix.
  • Where practical, file content is segregated so that individual directories are either instance-specific and the common, but not both.
  • The common directories are not duplicated but symlinked to a common base directory.
  • The common files are not duplicated but symlinked to a common base file.


Nonetheless, the base software configuration will be shared by all forums and the configuration set ups will be largely consistent across all forums. This will enable a common set of language templates to be maintained across all forums, and hence all languages will be available to the individual language forum Administrators (though these Administrators might only chose to enable a subset of languages within their own forum). Certain configuration items are forum-specific rather than language-specific. An example of this is that an advisory message is displayed in the post form asking users to read the Survival Guide before posting. We will have a Survival Guide in every forum, so the language-specific instruction can a direct translation of a common instruction, that this the wording can be equivalent in content for every language and forum, so this text will be in the language template. However, the link to actual guide is forum specific and this will be maintained in the forum's system configuration table:

  • Language specifics will be maintained in the language template
  • Forum specifics will be maintained in the forum phpBB_XX_config table.

The way that phpBB establishes its root with Apache is by examining the current directory of any script that is directly executed through a URI. These are as follows:

  • forum: cron.php, faq.php, file.php, index.php, listforums.php, mcp.php, memberlist.php, posting.php, report.php, search.php, style.php, ucp.php, viewforum.php, viewonline.php, viewtopic.php
  • forum/adm:index.php, swatch.php
  • forum/download: file.php

Directory and File Organisation

Each forum will therefore have private copies of these directories along with the sub-directories which contain forum-specific files: cache, download, files, images, store. The remain subdirectories will be sym-linked to a common repository:

  • docs -> ../../common/forum/docs
  • language -> ../../common/forum/language
  • styles -> ../../common/forum/styles
  • includes -> ../../common/forum/includes
  • images -> ../../common/forum/images(+)

The individual files in the top level directory, etc., will be sym-linked to the common copy:

  • forum/config.php -> ../../common/forum/config.php(+)
  • forum/common.php -> ../../common/forum/common.php
  • forum/cron.php -> ../../common/forum/cron.php
  • forum/faq.php -> ../../common/forum/faq.php
  • forum/file.php -> ../../common/forum/file.phpx`
  • forum/index.php -> ../../common/forum/index.php
  • forum/listforums.php -> ../../common/forum/listforums.php
  • forum/mcp.php -> ../../common/forum/mcp.php
  • forum/memberlist.php -> ../../common/forum/memberlist.php
  • forum/posting.php -> ../../common/forum/posting.php
  • forum/report.php -> ../../common/forum/report.php
  • forum/search.php -> ../../common/forum/search.php

(+) The config.php file has been modified to pick up the NL designator from the current path, enabling the same config.php to be used for all NL instances. In general these subdirectories are well categorised into holding shared or instance-specific data. The one exception which is a mess is the images subdirectory. This contains shared content such as the avatar image galleries and instance-specific data such as uploaded avatars. However, the locations of this instance-specific data is configurable and I have therefore tidied this up by moving these instance-specific data to a separate instance-images subdirectory:

Config item
Current Default
New Values
avatar_path images/avatars/upload instance-images/avatars/upload
avatar_gallery_path images/avatars/gallery instance-images/avatars/gallery
upload_icons_path images/upload_icons instance-images/upload_icons
ranks_path images/ranks instance-images/ranks

This may all sound inefficient, but the phpBB caching system (which is local to each forum) means that the material overhead of this redirection where the code is dynamically modified (as in the case of the template based code) is practically removed. Also since the phpBB engine understands the symlink redirection, the php caching system is more effective as the code is shared across NL instances.

Hence the software configuration of the new forum versions will be a clean patched install. Only the forum content will be migrated. This content is largely contained in a set of forum-specific tables, though it also comprises a set of files in the cache, download, files, images and store directories. I have written a small shell script (as this runs on the old hardware which doesn't have Python installed) which can dump either all tables relating to a specific language forum or the subset of tables that contain the user and post data. I also have a second python script which converts this PostgreSQL V3.0.1 schema and dump into a MySQL V3.0.4 compatible load.

Database Organisation

One facet of the migration is the process of Live Synchronisation. This exploits the natural structure of the phpBB data model, and specifically that the phpBB tables can be grouped into the following two categories. This enables the overlay of the current user and post data onto a previous migration:

  1. Tables which define the forum configuration. These largely define the structure and configuration forum, and are only changed when this configuration is changed. These tables can be migrated and refined during the pilot runs to finalise the new database layout. As long as the configuration is frozen during the migration period, e.g. no changes to the forum hierarchy, these tables don’t need to be updated. The one exception here is the forums table which contains some denormalised data elements (e.g. the number of topic in the forum) which need special treatment as discussed below.
    acl_groups, acl_options, acl_roles, acl_roles_data, bbcodes, bots, config, disallow, extension_groups, extensions, forums, forums_access, groups, icons, knowledge_base, lang, moderator_cache, modules, profile_fields, profile_fields_lang, profile_lang, ranks, search_wordlist, search_wordmatch, sitelist, smilies, styles, styles_imageset, styles_imageset_data, styles_template, styles_template_data, styles_theme, words, zebra
  2. Tables which contain post and user related data. These contain the user content that evolve over time as users register and post to topics. These need to be migrated again in the second phase and then copied over the target database.
    acl_users, attachments, banlist, bookmarks, confirm, drafts, forums_track, forums_watch, log, poll_options, poll_votes, posts, privmsgs, privmsgs_folder, privmsgs_rules, privmsgs_to, profile_fields_data, reports, reports_reasons, search_results, sessions, sessions_keys, topics, topics_posted, topics_track, topics_watch, user_group, users, warnings

By splitting the tables this way we can migrate the database, then take some days getting it configured correctly whilst the live (phpBB v3.0.1) database is being updated. We can then remigrate the post and user related data, so that you can tweak the rest of the configuration during testing in parallel to live use of the database. It therefore does not mirror groups, role based access to forums, etc. so if these are out of sync you will need to manually resynchronise them.

On a first load the full schema is loaded and all tables. On resynchronisation of post and user related data, only the tables in the second group are truncated and reinserted.

The forum table is denormalised to include information on total topic counts, details on the last post, etc. The merge script include SQL updates to update these fields from the topics table.

Migration Process

The migration will take place in three phases:

  • The first involves building a complete forum set in a virtual LAMP appliance. It's just easier for me to do this on a LAN connected VM than on the target Solaris hardwar and zone in the Sun Labs in Germany over the internet. Then each of the current language databases will be snapped as a SQL backup into the forum backups directory, and then wget pulled to the LAMP appliance for loading and configuring into the test system. I will test and tune this LAMP configuration to validate the basic configuration.
  • Once I have tested this configuration, I will then transfer it to the OOoTest system in Germany. Prior to the transfer I will truncate the tables in group (1) before exporting the database. I then dump the group (1) tables and refresh the test database to resync it to live. At this point the individual language forum leaders will be given Administrator access to each forum to validate functionality and configuration.
  • Once the Administrators have confirmed the system configuration of their respective forums, we are then ready to go live. The cut over itself will be based on a process that I developed and tested in the VirtualBox migration. This is to bring the live forums offline, then backup the current user and post data and then overlay this onto the target OOoTest server. This involves truncating and replacing the content of the following tables from live. Once this import is complete the forums are available for go-live.

Forum Customisation

The delivered customised instances contains the following changes:

  • Reorder window title to make multi-tab title more usable. All browser tabs are labelled with their title. The phpBB3 default is to begin the title with the forum name then page category, hence if you have a lot of tabs open they are called openOffi... By putting the specific title first, it makes it a lot easier to pick the correct tab.
  • Add Expand option to Code Block. phpBB3 now puts each code block in a scrollable window. This makes it a pain if you want to see all of the code in the browser. This mod adds an Expand option to the code block to allow you to blow it up to full size.
  • Tweak Access key codes to remove conflicts. For those that use keyboard accelerators within the browser, the lack of a shortcut for Preview Post is a pain. This makes <Accel>+P a shortcut for Preview and <Accel>+G a shortcut for graphics.
  • Add Active Post count to View Active Topics. This just adds a count of new posts in the Active topics heading.
  • Add truncated title of last topic to Forum View. Lots of OOo volunteers asked for this one. It enables you to recognise at a glance if a forum has any new posts in it.
  • Customise Headers and Footers to OpenOffice.org branding. Add OOo Community logo and adjust framing colours to match. Add OOo favicon.
  • Rebasline to phpBB 3.0.4. The install is a patch file which updates the standard 3.0.4 unpacked tarball.
  • Add OpenOffice.org Qs to BB FAQ. The FAQ button goes to a phpBB3 standard FAQ which is all about using phpBB3. I've added a Q0 which points the user to the wiki and forum FAQs for openOffice.org info.
  • Add instruction to “read Survival Guide before posting” to post form. The posting form now includes an instruction to read the survival guide before you post. (This is only shown if you have less than 15 posts).
  • Customisations that are configurable through phpBB. These don't need code or style changes:
    • Add custom fields to define users, main OS, installed OOo version and supplementary OSs.
    • Add additional BBcodes for [edit], [pre], [strike], [center], [sub], [super], [hr]
Personal tools