Difference between revisions of "User:TerryE/phpBB3.0.4 Migration/Cache Tuning"

From Apache OpenOffice Wiki
Jump to: navigation, search
(PhpBB Application Caching)
(PhpBB Application Caching)
Line 31: Line 31:
 
The template cache has extremely low volatility and therefore creating this dynamic code is an extremely effective mechanism for avoiding one-off processing.  The number of code objects is also relatively small (typically less than 100).  So this approach is also well matched to PHP accelerator technologies.  
 
The template cache has extremely low volatility and therefore creating this dynamic code is an extremely effective mechanism for avoiding one-off processing.  The number of code objects is also relatively small (typically less than 100).  So this approach is also well matched to PHP accelerator technologies.  
  
One the other hand, the data set and SQL query caches can generate (on our forums) hundreds of thousands of small and reasonably volatile objects.  The decision to move these into PHP code fragments was a fundamental mistake as this can flush any PHP accelerator cache.  On my system, I have improved performance by setting up an APC filter to exclude these files from the PHP cache.   
+
One the other hand, the data set and SQL query caches can generate (on our forums) hundreds of thousands of small and reasonably volatile objects.  The decision to move these into PHP code fragments was a fundamental mistake as this can flush any PHP accelerator cache.  On my system, I have actually improved performance by setting up an APC filter to exclude these files from the PHP cache. 
 +
  apc.cache_by_default=1
 +
  apc.filters="(/cache/sql_|/cache/data_)"
 +
which seems a shame because using the filestore to cache these is a good compromise, as is the use of an accelerator such memcache or even the APC ''data store'', but not the APC code cache.   
  
are not as they can be 
+
'''Note:'''
 
+
You need to be careful with using any PHP accelerator which assumes that such file content is [[Wikipedia:idempotent|idempotent]]For example filename moniker is based on the SQL query alone, and this won't change even if the phpBB caching layer itself refreshes the contentHence unless you have file modification detection enabled:
 
+
   apc.stat=1
The results of any such query are cached for ''ttl'' seconds, if ''ttl'' is no-zero, so if the query is repeated during this period then the cached result set is used instead of requerying the databaseThis cache is currently used for a small number of high-frequency queries as shown in the Appendix.  A standard strategy is used:  In the default phpBB implementation, these cache objects are stored as files in the cache directory.  The query routine check for the existance of this file and if so executes a phpBB include on it.  The included code contains two assignments: one to return serialised rowset and the other a ''ttl'' expiry stamp (see following example)If ''ttl'' has expired or the file doesn't exist then the database is queried to obtain the rowset, and this is then serialised and used to create a new copy of the file.  If the ''ttl'' is still valid windows then the cache result is deserialised and returned as a rowset.
+
phpBB will end up including stale content. Note that whilst this is the default setting, many server configuration guides recommend against this because of the slight performance degradation, so in some circumstances this will lead to somewhat bizarre results for the queries below(In my case this was compounded by a bug in the CSK 1.3.1 implementation of APC doesn't process this setting properly.)
   /* SELECT COUNT(DISTINCT s.session_ip) as num_guests FROM phpbb_en_sessions s \
+
    WHEREs.session_user_id = 1 AND s.session_time >= 1242322740 AND s.session_forum_id = 61  */
+
  $expired = (time() > 1242323134) ? true : false;
+
  if ($expired) { return; }
+
  $this->sql_rowset[$query_id] = unserialize('a:1:{i:0;a:1:{s:10:"num_guests";s:1:"0";}}');
+
 
+
 
+
: This code sets the boolean $expired and if false then also returns the results set. If true or the file doesn't exist then the execution does the SQL query then creates this file (but only for what the application deems as reusable queries). With this phpBB cache the results are latched for a specific time (in this example the expiry time was some 6 mins after the start of the session) even if the underlying DB data has changed. Thanks to the APC, the code for this is also cached to executing this is quick. This approach is different to the MySQL cache, where the execution plan for each SQL queries is cached in an LRU buffer. This saves parse costs, but the query is still re-evaluated each call.
+
 
+
 
+
One of the major issues with using a file based cache for this is that some of these sql queries are quite volatile and therefore generate a lot of small fragmentary sql_*.php files. filling the cacheWhat is a little odd here is that MySQL itself caches this type of query in the MySQL query cache [http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/]; they typically execute in a few mS within MySQL, so this file based strategy can actually work out slower.
+
 
+
:Any time phpBB executes a SQL query, style template, or dataset retrieval, it creates an md5 of the sql query / tempate and stores this under the name sql_<md5Value> in the cache (or euqivalent for the other types).  Before executing the query, it does an existence check for the file andexecutes an @include it to evaluate it.  PHP loades the cached bytecode version instead. Here is a typical example
+
  
 
== Appendix ==
 
== Appendix ==

Revision as of 13:27, 23 May 2009

A PHP Cache Overview and Cache Tuning

Many facets of computers systems have roughly followed a Moore's Law-like trend over the last four decade, such as memory capacity, memory and CPU cycle times, disk capacity. Yet others have barely maintained a steady linear improvement (typically where physics introduces hard constraints). A good example is hard disk access times which have barely improved by an order of magnitude at the same time as memory capacities have improved by six orders. Caching Technologies are often introduced to mitigate and sometime effectively to eliminate the consequences of these linear constraints, and in a modern computer systems there can be many layers of caching going on, but such caching systems can 'fight' and in the worst case you can be hit by cache coherency issues. This note is about how these issues impact the performance of the phpBB V3.0.4 OpenOffice.org user forums.

System Caches

  • PHP Cache
The PHP architecture splits the execution of PHP code into two stages: the compilation of source into bytecode and its subsequent execution. This facilitates caching within Apache so that the relatively expensive compile operations can be carried out on a just-in-time basis and stored in a cache. Since complex applications such as phpBB, Drupal and MediaWiki have a large source base, this compilation would normally take perhaps 50-75% of CPU cycles; an effective caching strategy can therefore result in a 2-4 x decrease in CPU loading [1],[2]. There are a number of PHP accelerators avaialble but since our forums run on a Solaris Coolstack (CSK) stack, I have adopted the APC cache which is included in the CSK 1.3.1 distribution. Enabling the APC cache will by default cause any compiled code to be loaded into an LRU cache, and on our system I have initially allocated 64Mb to this cache. The APC can also store and retrieve data by key.
  • MySQL Cache
The MySQL database engine maintains a number of caches to accelerate performance: the main caches buffer key indexes in memory, but data rows, execution plans and query results are also cached. The MySQL query cache on our server is currently configured at 32Mb, The caching algorithm [3] is similar to one I discuss below for phpBB SQL caching. Whilst the effectiveness such caches is limited by the need to maintain coherences (so results relating to table X are flushed when ever X is updated), whereas the application tier can understand the context of the result sets and avoid some such flushed. Nonetheless on our system, MySQL cache hit rates are over 75%.
  • File System Cache
*nix file systems maintain large directory and buffer caches in memory [4]. These enable very efficient access to frequently used files and in many cases can obviate the need for disk I/O altogether. However, any file updates do have to be flushed back to disk in order to maintain file system on-disk integrity. Under normal circumstances this won't slow the application, because such writes are asynchronous to the application. However the overall seek / transfer rate on to physical devices represents a hard physical upper constrains and the application can reach a tipping point where the demand flush rate exceeds the corresponding, with the application becoming I/O bound at that point.

PhpBB Application Caching

The phpBB application is design to deployed from low-end shared-service implementations running small forums though to large forums running on dedicated hardware; it has to largely neutral to host OS, database and http server implementations, and yet deliver a competitive runtime efficiency. The implementation therefore makes minimal assumptions about the underlying system stack, but in an attempt to accelerate execution performance phpBB implements a number of caching strategies around the use of dynamically generated PHP code maintained in a cache directory. One of the reasons here is that if the server stack also includes a PHP accelerator, then these dynamic modules will also be cached at the bytecode level.

The application generated code falls into the following categories:

  • Template Caching
The phpBB applications architecture separates business logic from presentation and National Language (NL) support. One of the key implementation elements here is that there is a HTML webpage associated with each display page. This page has an extended markup which enable conditional and loop control code as well as application and NL field embedding. Since this extension is not supported by the HTML engine itself, such pages need processing to render the HTML. In order to avoid the high overhead associated with this processing, each template HTML is processed on first reference and converted into the corresponding PHP page. These PHP pages are stored in the forum's cache directory with the name tpl_<style>_<function/page>.html.php,
  • Dataset Caching
phpBB also uses this cache whenever it needs to maintain persistent data. The two main access methods are put and get. The put has an optional an optional time-to-live (ttl) argument which defines the time validity of the stored object (with a default of 365 days). If the data is retrieved within this time this value is returned. However, storage is not assured (as the cache might be purged or be size limited) so the calling layer must also handle the need to recreate the object on cache-miss.
  • SQL Query Caching
The unified database implementation has a common access method sql_query which uses this dataset caching. The method has an optional time-to-live (ttl) argument which is set to a non-zero value by the application layer if it can tolerate recent albeit not current data within the application context. The default ttl is zero and in this case the cache is bypassed entirely and the query is passed to the database layer. The cache is used for for non-zero ttls, with the query object being 'uniquely' identified by a moniker based on its MD5 digest, and its content the serialised result set. In the case of a cache hit,the database call is optimised away, and the query result set is obtained by deserialising the cached object. In the case of a cache miss, the sql query is passed to the database layer, and the cache is reloaded with the result set before it is returned to the calling application.

The template cache has extremely low volatility and therefore creating this dynamic code is an extremely effective mechanism for avoiding one-off processing. The number of code objects is also relatively small (typically less than 100). So this approach is also well matched to PHP accelerator technologies.

One the other hand, the data set and SQL query caches can generate (on our forums) hundreds of thousands of small and reasonably volatile objects. The decision to move these into PHP code fragments was a fundamental mistake as this can flush any PHP accelerator cache. On my system, I have actually improved performance by setting up an APC filter to exclude these files from the PHP cache.

 apc.cache_by_default=1
 apc.filters="(/cache/sql_|/cache/data_)" 

which seems a shame because using the filestore to cache these is a good compromise, as is the use of an accelerator such memcache or even the APC data store, but not the APC code cache.

Note: You need to be careful with using any PHP accelerator which assumes that such file content is idempotent. For example filename moniker is based on the SQL query alone, and this won't change even if the phpBB caching layer itself refreshes the content. Hence unless you have file modification detection enabled:

 apc.stat=1

phpBB will end up including stale content. Note that whilst this is the default setting, many server configuration guides recommend against this because of the slight performance degradation, so in some circumstances this will lead to somewhat bizarre results for the queries below. (In my case this was compounded by a bug in the CSK 1.3.1 implementation of APC doesn't process this setting properly.)

Appendix

File
TTL
Query
includes/functions_display.php 3600 SELECT m.*, u.user_colour, g.group_colour, g.group_type FROM (moderator_cache m) LEFT JOIN phpbb3_users u ON (m.user_id = u.user_id) LEFT JOIN phpbb3_groups g ON (m.group_id = g.group_id) WHERE m.display_on_index = 1 AND m.forum_id in (list)
includes/functions_display.php 3600 SELECT forum_name FROM forums WHERE forum_id = N
includes/functions.php 3600 SELECT COUNT(DISTINCT s.session_ip) as num_guests FROM phpbb3_sessions s WHERE s.session_user_id = 1 AND session_time >= N AND s.session_forum_id = N
includes/message_parser.php 600 SELECT * FROM smilies ORDER BY LENGTH(code) DESC
includes/functions_posting.php 3600 SELECT * FROM smilies WHERE display_on_posting = 1 ORDER BY smiley_order
includes/session.php  ? SELECT ban_ip, ban_userid, ban_email, ban_exclude, ban_give_reason, ban_end FROM banlist WHERE ... ';
3600 SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, t.template_inherits_id, t.template_inherit_path, c.theme_path, c.theme_name, c.theme_storedb, c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name FROM styles s, styles_template t, styles_theme c, styles_imageset i WHERE s.style_id = $style AND t.template_id = s.template_id AND c.theme_id = s.theme_id AND i.imageset_id = s.imageset_id
3600 SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, c.theme_path, c.theme_name, c.theme_storedb, c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name FROM styles s, styles_template t,styles_theme c, styles_imageset i WHERE s.style_id = $style AND t.template_id = s.template_id AND c.theme_id = s theme_id AND i.imageset_id = s.imageset_id";
3600 SELECT image_name, image_filename, image_lang, image_height, image_width FROM styles_imageset_data WHERE imageset_id = 'N' AND image_filename <> AND image_lang IN ( language list)";
includes/bbcode.php 3600 SELECT * FROM bbcodes WHERE ...
includes/acp/acp_profile.php 3600 SELECT forum_name FROM forums WHERE forum_id = N
includes/acp/acp_permissions.php 3600 SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC
includes/acp/auth.php 3600 SELECT forum_id, enable_indexing FROM forums
ncludes/acp/acp_search.php 3600 SELECT forum_id, enable_indexing FROM forums
includes/functions_content.php 3600 SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC
includes/functions_admin.php 360 SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC
600 SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forums ORDER BY left_id ASC
3600 SELECT forum_name FROM forums ' . FORUMS_TABLE WHERE forum_id = N
3600 SHOW TABLE STATUS FROM db_name
includes/mcp/mcp_queue.php 3600 SELECT forum_id, forum_name FROM forums WHERE forum_id in (list);
style.php 300 SELECT s.style_id, c.theme_id, c.theme_data, c.theme_path, c.theme_name, c.theme_mtime, i.*, t.template_path FROM styles s, styles t, styles_theme c, styles_imageset i WHERE s.style_id = N AND t.template_id = s.template_id AND c.theme_id = s. theme_id AND i.imageset_id = s.imageset_id
3600 SELECT s.style_id, c.theme_id, c.theme_data, c.theme_path, c.theme_name, c.theme_mtime, i.*, t.template_path FROM styles s, styles_template t, styles_theme c, styles_imageset i WHERE s.style_id = N AND t.template_id = s.template_id AND c.theme_id = s.theme_id AND i.imageset_id = s.imageset_id
includes/functions_privmsgs.php 300 SELECT forum_id, forum_name, parent_id, forum_type, left_id, right_id FROM forum ORDER BY left_id ASC
Personal tools