User:TerryE/phpBB3.0.4 Migration/Cache Tuning
A PHP Cache Overview and Cache Tuning
- This is an overview document. No further work is currently planned but I might update this document in line with the current phpBB version, in which case I will move into the correct sub-section. Activities on this page/sub-section are now closed.
Many facets of computers systems have roughly followed a Moore's Law-like trend over the last four decades, such as memory capacity, memory and CPU cycle times, disk capacity. Yet others have barely maintained a steady linear improvement (typically where physics introduces hard constraints). A good example is hard disk access times which have barely improved by an order of magnitude at the same time as memory capacities have improved by six orders. Caching Technologies are often introduced to mitigate and sometime effectively to eliminate the consequences of these linear constraints, and in a modern computer systems there can be many layers of caching going on, but such caching systems can 'fight' and in the worst case you can be hit by cache coherency issues. This note is about how these issues impact the performance of the phpBB V3.0.4 OpenOffice.org user forums.
- PHP Cache
- The PHP architecture splits the execution of PHP code into two stages: the compilation of source into bytecode and its subsequent execution. This facilitates caching within Apache so that the relatively expensive compile operations can be carried out on a just-in-time basis and stored in a cache. Since complex applications such as phpBB, Drupal and MediaWiki have a large source base, this compilation would normally take perhaps 50-75% of CPU cycles; an effective caching strategy can therefore result in a 2-4 x decrease in CPU loading ,. There are a number of PHP accelerators avaialble but since our forums run on a Solaris Coolstack (CSK) stack, I have adopted the APC cache which is included in the CSK 1.3.1 distribution. Enabling the APC cache will by default cause any compiled code to be loaded into an LRU cache, and on our system I have initially allocated 64Mb to this cache. The APC can also store and retrieve data by key.
- MySQL Cache
- The MySQL database engine maintains a number of caches to accelerate performance: the main caches buffer key indexes in memory, but data rows, execution plans and query results are also cached. The MySQL query cache on our server is currently configured at 32Mb, The caching algorithm  is similar to one I discuss below for phpBB SQL caching. Whilst the effectiveness such caches is limited by the need to maintain coherences (so results relating to table X are flushed when ever X is updated), whereas the application tier can understand the context of the result sets and avoid some such flushed. Nonetheless on our system, MySQL cache hit rates are over 75%.
- File System Cache
- *nix file systems maintain large directory and buffer caches in memory . These enable very efficient access to frequently used files and in many cases can obviate the need for disk I/O altogether. However, any file updates do have to be flushed back to disk in order to maintain file system on-disk integrity. Under normal circumstances this won't slow the application, because such writes are asynchronous to the application. However the overall seek / transfer rate on to physical devices represents a hard physical upper constrains and the application can reach a tipping point where the demand flush rate exceeds the corresponding, with the application becoming I/O bound at that point.
PhpBB Application Caching
The phpBB application is design to deployed from low-end shared-service implementations running small forums though to large forums running on dedicated hardware; it has to largely neutral to host OS, database and http server implementations, and yet deliver a competitive runtime efficiency. The implementation therefore makes minimal assumptions about the underlying system stack, but in an attempt to accelerate execution performance phpBB implements a number of caching strategies around the use of dynamically generated PHP code maintained in a cache directory. One of the reasons here is that if the server stack also includes a PHP accelerator, then these dynamic modules will also be cached at the bytecode level.
The application generated code falls into the following categories:
- Template Caching
- The phpBB applications architecture separates business logic from presentation and National Language (NL) support. One of the key implementation elements here is that there is a HTML webpage associated with each display page. This page has an extended markup which enable conditional and loop control code as well as application and NL field embedding. Since this extension is not supported by the HTML engine itself, such pages need processing to render the HTML. In order to avoid the high overhead associated with this processing, each template HTML is processed on first reference and converted into the corresponding PHP page. These PHP pages are stored in the forum's cache directory with the name tpl_<style>_<function/page>.html.php,
- Dataset Caching
- phpBB also uses this cache whenever it needs to maintain persistent data. The two main access methods are put and get. The put has an optional an optional time-to-live (ttl) argument which defines the time validity of the stored object (with a default of 365 days). If the data is retrieved within this time this value is returned. However, storage is not assured (as the cache might be purged or be size limited) so the calling layer must also handle the need to recreate the object on cache-miss.
- SQL Query Caching
- The unified database implementation has a common access method sql_query which uses this dataset caching. The method has an optional time-to-live (ttl) argument which is set to a non-zero value by the application layer if it can tolerate recent albeit not current data within the application context. The default ttl is zero and in this case the cache is bypassed entirely and the query is passed to the database layer. The cache is used for for non-zero ttls, with the query object being 'uniquely' identified by a moniker based on its MD5 digest, and its content the serialised result set. In the case of a cache hit,the database call is optimised away, and the query result set is obtained by deserialising the cached object. In the case of a cache miss, the sql query is passed to the database layer, and the cache is reloaded with the result set before it is returned to the calling application.
The template cache has extremely low volatility and therefore creating this dynamic code is an extremely effective mechanism for avoiding one-off processing. The number of code objects is also relatively small (typically less than 100). So this approach is also well matched to PHP accelerator technologies.
One the other hand, the data set and SQL query caches can generate (on our forums) hundreds of thousands of small and reasonably volatile objects. The decision to move these into PHP code fragments was a mistake as this will tend to flush any PHP accelerator cache. On my system, I have realised a small performance improvement by setting up an APC filter to exclude these files from the PHP cache.
Using the filestore to cache these is a good compromise, as is the use of an accelerator such memcache or even the APC data store, but not the APC code cache, as encoding them as a PHP file as opposed to a raw serialize stream seems an odd design decision. However, even the file based cache can create performance problems. We have well over a hundred forums across the NL forums. The highlighted query was generating 2-5 file creates and deletes per second whilst the query itself was reasonably well cached in MySQL and registered as a runtime of 0.00s, so I removed the ttl for this which resulted in a noticeable improvement in system performance.
Note: You need to be careful with using any PHP accelerator which assumes that such file content is idempotent. For example filename moniker is based on the SQL query alone, and this won't change even if the phpBB caching layer itself refreshes the content. Hence unless you have file modification detection enabled:
phpBB will end up including stale content. Whilst this is the default setting, many server configuration guides recommend setting apc.stat=0 as this will result in improved filesystem and cache, so in some circumstances this will lead to somewhat bizarre results for the queries below. (In my case this was compounded by a bug in the CSK 1.3.1 implementation of APC doesn't process this setting properly.)
|includes/functions.php||60||SELECT COUNT(DISTINCT s.session_ip) as num_guests FROM phpbb3_sessions s WHERE s.session_user_id = 1 AND session_time >= N AND s.session_forum_id = N|
|style.php||300||SELECT s.style_id, c.theme_id, c.theme_data, c.theme_path, c.theme_name, c.theme_mtime, i.*, t.template_path FROM styles s, styles t, styles_theme c, styles_imageset i WHERE s.style_id = N AND t.template_id = s.template_id AND c.theme_id = s. theme_id AND i.imageset_id = s.imageset_id|
|includes/functions_privmsgs.php||300||SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forum ORDER BY left_id ASC|
|includes/functions_admin.php||360||SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC|
|includes/functions_admin.php||600||SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC|
|includes/message_parser.php||600||SELECT * FROM smilies ORDER BY LENGTH(code) DESC|
|includes/functions_display.php||3600||SELECT m.*, u.user_colour, g.group_colour, g.group_type FROM (moderator_cache m) LEFT JOIN phpbb3_users u ON (m.user_id = u.user_id) LEFT JOIN phpbb3_groups g ON (m.group_id = g.group_id) WHERE m.display_on_index = 1 AND m.forum_id in (list)|
|includes/functions_display.php||3600||SELECT forum_name FROM forums WHERE forum_id = N|
|includes/functions_posting.php||3600||SELECT * FROM smilies WHERE display_on_posting = 1 ORDER BY smiley_order|
|includes/session.php||3600||SELECT ban_ip, ban_userid, ban_email, ban_exclude, ban_give_reason, ban_end FROM banlist WHERE ... ';|
|style.php||3600||SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, t.template_inherits_id, t.template_inherit_path, c.theme_path, c.theme_name, c.theme_storedb, c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name FROM styles s, styles_template t, styles_theme c, styles_imageset WHERE s.style_id = $style AND t.template_id = s.template_id AND c.theme_id = s.theme_id AND i.imageset_id = s.imageset_id|
|includes/session.php||3600||SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, c.theme_path, c.theme_name, c.theme_storedb, c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name FROM styles s, styles_template t,styles_theme c, styles_imageset i WHERE s.style_id = $style AND t.template_id = s.template_id AND c.theme_id = s theme_id AND i.imageset_id = s.imageset_id";|
|includes/session.php||3600||SELECT image_name, image_filename, image_lang, image_height, image_width FROM styles_imageset_data WHERE imageset_id = 'N' AND image_filename <> AND image_lang IN ( language list)";|
|includes/bbcode.php||3600||SELECT * FROM bbcodes WHERE ...|
|includes/acp/acp_profile.php||3600||SELECT forum_name FROM forums WHERE forum_id = N|
|includes/acp/acp_permissions.php||3600||SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC|
|includes/acp/auth.php||3600||SELECT forum_id, enable_indexing FROM forums|
|includes/acp/acp_search.php||3600||SELECT forum_id, enable_indexing FROM forums|
|includes/functions_admin.php||3600||SELECT forum_name FROM forums ' . FORUMS_TABLE WHERE forum_id = N|
|includes/session.php||3600||SHOW TABLE STATUS FROM db_name|
|includes/mcp/mcp_queue.php||3600||SELECT forum_id, forum_name FROM forums WHERE forum_id in (list);|
|style.php||3600||SELECT s.style_id, c.theme_id, c.theme_data, c.theme_path, c.theme_name, c.theme_mtime, i.*, t.template_path FROM styles s, styles_template t, styles_theme c, styles_imageset i WHERE s.style_id = N AND t.template_id = s.template_id AND c.theme_id = s.theme_id AND i.imageset_id = s.imageset_id|