Difference between revisions of "User:TerryE/Traffic Server Configuration"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Update Apache config)
(Configuration)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
+
== Why Traffic Server? ==
== Why Traffic Server? ==
+
  
Apache Traffic Server is a lightweight, yet high-performance, web proxy cache that improves network efficiency and performance<ref>http://trafficserver.apache.org/</ref> . Like [[MWmanual:Squid caching|Squid]] and [[MWmanual:Varnish caching|Varnish]], Traffic Server can be configured as a reverse proxy<ref>http://trafficserver.apache.org/docs/v2/admin/reverse.htm</ref>. In this mode, it acts as a full surrogate for the back-end wiki with port 80 on the advertised hostname for the wiki resolving to Traffic Server. In doing so this enables the processing of web requests to be offloaded from the PHP and database intensive [[MWwiki:MediaWiki|MediaWiki]] application.
+
Apache Traffic Server is a lightweight, yet high-performance, web proxy cache that improves network efficiency and performance<ref>http://trafficserver.apache.org/</ref> . Like [[mwmanual:Squid caching|Squid]] and [[mwmanual:Varnish caching|Varnish]], Traffic Server can be configured as a reverse proxy<ref>http://trafficserver.apache.org/docs/v2/admin/reverse.htm</ref>. In this mode, it acts as a full surrogate for the back-end wiki with port 80 on the advertised hostname for the wiki resolving to Traffic Server. In doing so this enables the processing of web requests to be offloaded from the PHP and database intensive [[mwwiki:MediaWiki|MediaWiki]] application.  
  
Traffic Server can be configure to store high frequency cached content in memory, and where content is flush to disk, access will still invovle significantly less physical I/O than the MediaWiki application. Hence permitting a significantly higher throughput for a give CPU and I/O resource constraint. MediaWiki id been designed to integrate closely with such web cache packages and will Traffic Server when a page should be purged from the cache in order to be regenerated. From MediaWiki's point of view, a correctly-configured Traffic Server installation is interchangeable with Squid or Varnish.
+
Traffic Server can be configure to store high frequency cached content in memory, and where content is flush to disk, access will still invovle significantly less physical I/O than the MediaWiki application. Hence permitting a significantly higher throughput for a give CPU and I/O resource constraint. MediaWiki id been designed to integrate closely with such web cache packages and will Traffic Server when a page should be purged from the cache in order to be regenerated. From MediaWiki's point of view, a correctly-configured Traffic Server installation is interchangeable with Squid or Varnish.  
  
==The architecture==
+
== The architecture ==
An example setup of Traffic Server, Apache and MediaWiki on a single server is outlined below. A more complex [[MWmanual:Cache strategy|caching strategy]] may use multiple web servers behind the same Traffic Server caches (all of which can be made to appear to be a single host) or use independent servers to deliver wiki or image content.
+
 
 +
An example setup of Traffic Server, Apache and MediaWiki on a single server is outlined below. A more complex [[mwmanual:Cache strategy|caching strategy]] may use multiple web servers behind the same Traffic Server caches (all of which can be made to appear to be a single host) or use independent servers to deliver wiki or image content.  
  
 
{|
 
{|
 
|-
 
|-
|Outside world
+
| Outside world  
|<--->
+
| &lt;---&gt;
|style='border:1px solid black;'|
+
| style="border:1px solid black;" |  
Server<br>
+
Server<br>  
 +
 
 
{|
 
{|
 
|-
 
|-
|style='border:1px solid black;'|
+
| style="border:1px solid black;" |  
Traffic Server accelerator<br>
+
Traffic Server accelerator<br> <code>w.x.y.z:80</code>  
<code>w.x.y.z:80</code>
+
 
|<--->
+
| &lt;---&gt;
|style='border:1px solid black;'|
+
| style="border:1px solid black;" |  
Apache webserver<br>
+
Apache webserver<br> <code>127.0.0.1:80</code>  
<code>127.0.0.1:80</code>
+
 
 
|}
 
|}
 +
 
|}
 
|}
  
To the outside world, Traffic Server appears to act as the web server. In reality it passes on requests to the Apache web server, but only when necessary. An Apache running on the same server only listens to requests from localhost (127.0.0.1) while Traffic Server only listens to requests on the server's external IP address. Both services run on port 80 without conflict as each is bound to different IP addresses.
+
To the outside world, Traffic Server appears to act as the web server. In reality it passes on requests to the Apache web server, but only when necessary. An Apache running on the same server only listens to requests from localhost (127.0.0.1) while Traffic Server only listens to requests on the server's external IP address. Both services run on port 80 without conflict as each is bound to different IP addresses.  
 +
 
 +
== Traffic Server 3.0.1  ==
  
==Configuring Traffic Server 2.x ==
+
=== Installation and preparation ===
=== /etc/sysconfig/Traffic Server ===
+
This is the first configuration file loaded by Traffic Server on startup. It specifies the amount of memory to be allocated to the Traffic Server cache, the location of the main (*.vcl) configurations and the specific IP addresses to which Traffic Server must respond.
+
  
The remainder of the configuration data, including the address of the backend server(s), is listed in the main *.vcl file - not here.
+
The ATS README and INSTALL define the Ubuntu package dependencies, so before doing the build the following packages were installed:
  
 
<source lang="bash">
 
<source lang="bash">
# Needs to go in here
+
sudo apt-get install  autoconf  automake libtool g++ libssl-dev tcl-dev expat libexpat-dev libpcre3-dev libcap-dev
 
</source>
 
</source>
  
=== A sample XXXX ===
+
and the system group / user was added for an '''ats-data''' account much as Apache server on Debian is configured to use '''www-data'''. The package was then installed by executing the following from the kit directory.  The configure options mean that the kit is installed with a standard Debian layout and using the '''ats-service''' account:
The address(es) of the backend server(s) must be specified here. In a simple installation, one server (localhost) is sufficient. Larger sites may operate multiple wiki or image servers behind a single Traffic Server cache<ref>http://wikia.googlecode.com/svn/utils/Traffic Serverhtcpd/mediawiki.vcl</ref>:
+
  
<source lang="c">
+
<source lang="bash">
# set default backend if no server cluster specified
+
./configure --enable-layout=Debian --with-user=ats-data --with-group=ats-data
backend default {
+
make
        .host = "localhost";
+
sudo make install
        .port = "80"; }
+
</source>
  
# create a round-robin director: "apaches" uses roberto and sophia as backend servers.
+
ATS uses both a memory and a disk cache to buffer content hierarchically. Clearly, memory is used for high frequency content but it is still important that any physical I/O overheads for accession disk-based content are kept to an absolute minimum. Like many database packages, ATS recommends the use of a [[Wikipedia:Raw device|RAW partition]] for this disk cache and this needs to be configured. As we use [[Wikipedia:Logical Volume Manager (Linux)|LVM2]] on the VM, it is easy to set up the partition
director apaches round-robin {
+
  { .backend = { .host = "wiki1"; .port = "80"; } }
+
  { .backend = { .host = "wiki2"; .port = "80"; } } }
+
  
# access control list for "purge": open to only localhost and other local nodes
+
<source lang="bash">
acl purge {
+
lvcreate -L 2G -C y -n ooo-wiki-TScache  ooo-wiki-data-lvgroup      # Create the ATS cache LV
        "localhost";
+
modprobe raw                                                        # Load the raw device driver
        "wiki1";
+
sudo bash -c "echo raw >> /etc/modules"                             # And make sure that it is loaded on reboot
        "wiki2";
+
sudo raw /dev/raw/raw1 /dev/ooo-wiki-data-lvgroup/ooo-wiki-TScache  # Map the raw1 device to the LV
        "image1";
+
sudo chmod 660 /dev/raw/raw1                                        #  then change RW access
}
+
sudo chown ats-data:ats-data /dev/raw/raw1                          #  to ats-data
 
</source>
 
</source>
  
If more than one backend webserver is available, a list of servers to be used may be selected here on a per-domain basis. This could allow multiple, relatively powerful servers to be used to respond to wiki page text requests while requests for static images are handled on a local web server. A simple one-server installation would simply pass all unhandled requests to the default web server.
+
However, automatically adding this udev enumeration of an LVM mapped raw device ra is complex and can only be done by changing existing '''/lib/udev''' generators, so I recommend a KISS approach for now. These have to be executed prior to starting ATS, and as ATS uses a traditional System V init script rather than upstart, I have also  adopted this convention and created a tiny '''trafficserver-raw''' script to do this.  I leave this to an exercise for the reader. One this has been done, we need to hook ATS into the rc startup system:
  
Any requests other than a simple 'get' will be passed directly through to the web server, along with all requests from logged-in users.
+
<source lang="bash">
 +
sudo cp $WHATEVERYOURWOKINGDIRIS/trafficserver-raw /etc/init.d    # Hook the ATS raw helper start script into init.d
 +
sudo update-rc.d trafficserver-raw start 91 2 3 4 5 .              # but is only started
 +
sudo ln -s /usr/bin/trafficserver /etc/init.d/trafficserver        # Hook the trafficserver start/stop script
 +
sudo update-rc.d trafficserver start 93 2 3 4 5 . stop 07 0 1 6 .  # ATS is started after Apache and the raw helper
  
Most common browsers do support compression (gzip or zip) of returned pages. While Traffic Server itself performs no compression, it is configured here to store separate copies of a page depending on whether the user's browser supports compression.<ref>http://Traffic Server.projects.linpro.no/wiki/FAQ/Compression</ref> If a browser accepts both gzip and zip (deflate), the gzip version of the page is served as it is smaller and therefore slightly quicker to display. The browser's reported capabilities are checked here and the gzip'ped version of pages is served wherever possible.
+
sudo mkdir /x1/var_log_trafficserver                              # You need to set up the log directory. ATS doesn't do this
 
+
sudo chown ats-data:ats-data /x1/var_log_trafficserver            # And it needs to be owned by ats-data or the logging will fail
<source lang="c">
+
sudo ln -s /x1/var_log_trafficserver /var/log/trafficserver        # Link it into the Debian standard Logging directory structure
# vcl_recv is called whenever a request is received
+
sub vcl_recv {
+
        # Serve objects up to 2 minutes past their expiry if the backend
+
        # is slow to respond.
+
        set req.grace = 120s;
+
 
+
        # Use our round-robin "apaches" cluster for the backend.
+
        if (req.http.host ~ "^images.example.org$")
+
          {set req.backend = default;}
+
        else
+
          {set req.backend = apaches;}
+
 
+
        # This uses the ACL action called "purge". Basically if a request to
+
        # PURGE the cache comes from anywhere other than localhost, ignore it.
+
        if (req.request == "PURGE")
+
            {if (!client.ip ~ purge)
+
                {error 405 "Not allowed.";}
+
            lookup;}
+
 
+
        # Pass any requests that Traffic Server does not understand straight to the backend.
+
        if (req.request != "GET" && req.request != "HEAD" &&
+
            req.request != "PUT" && req.request != "POST" &&
+
            req.request != "TRACE" && req.request != "OPTIONS" &&
+
            req.request != "DELETE")
+
            {pipe;}    /* Non-RFC2616 or CONNECT which is weird. */
+
 
+
        # Pass anything other than GET and HEAD directly.
+
        if (req.request != "GET" && req.request != "HEAD")
+
          {pass;}      /* We only deal with GET and HEAD by default */
+
 
+
        # Pass requests from logged-in users directly.
+
        if (req.http.Authorization || req.http.Cookie)
+
          {pass;}      /* Not cacheable by default */
+
 
+
        # Pass any requests with the "If-None-Match" header directly.
+
        if (req.http.If-None-Match)
+
          {pass;}
+
 
+
        # Force lookup if the request is a no-cache request from the client.
+
        if (req.http.Cache-Control ~ "no-cache")
+
          {purge_url(req.url);}
+
 
+
        # normalize Accept-Encoding to reduce vary
+
        if (req.http.Accept-Encoding) {
+
          if (req.http.User-Agent ~ "MSIE 6") {
+
            unset req.http.Accept-Encoding;
+
          } elsif (req.http.Accept-Encoding ~ "gzip") {
+
            set req.http.Accept-Encoding = "gzip";
+
          } elsif (req.http.Accept-Encoding ~ "deflate") {
+
            set req.http.Accept-Encoding = "deflate";
+
          } else {
+
            unset req.http.Accept-Encoding;
+
          }
+
        }
+
 
+
        lookup;
+
}
+
 
</source>
 
</source>
  
Traffic Server must pass the user's IP address as part of the 'x-forwarded-for' header, so that MediaWiki may be configured to display the user's address in [[MWwiki:special:recentchanges]] instead of Traffic Server's local IP address.
+
=== Configuration ===
  
<source lang="c">
+
The ATS Administrator's Guide discusses two simple methods of defining the configuration.<ref>http://trafficserver.apache.org/docs/v2/admin/configure.htm</ref>. However, the package also provides Perl modules to facilitate configuration for those admins familiar with using Perl, so I have used this. The baseline configuration is as folllows:
sub vcl_pipe {
+
        # Note that only the first request to the backend will have
+
        # X-Forwarded-For set.  If you use X-Forwarded-For and want to
+
        # have it set for all requests, make sure to have:
+
        # set req.http.connection = "close";
+
  
        # This is otherwise not necessary if you do not do any request rewriting.
+
<source lang="perl">
 +
#
 +
# In this configuration, Apache Traffic Servicer (ATS) is configured as an HTTP reverse proxy
 +
# connected via the IP looback connector (127.0.0.1) to an in-VM Apache service which is running a
 +
# dedicated MediaWiki application to serve the OOo wiki. The VM is single-core with the application
 +
# typically services ~5-30 page requests per minute, plus associated image/CSS/JS requests needed
 +
# to render each page. For this type of configuration/load, the default ATS requires few changes. 
 +
#
 +
# As ATS provides a Perl Module to facilitate such changes, I have used this, and only four config
 +
# files need additions / changes.
 +
#
 +
BEGIN {push @INC, "/x1/wiki-kits/trafficserver-3.0.1/contrib/perl/ConfigMgmt/lib"; }
  
        set req.http.connection = "close";
+
use strict;
 +
use English;
 +
use Sys::Hostname;
 +
use Apache::TS::Config::Records;
 +
sub writeFile ($$) {
 +
  my( $outFile, $content ) = @_;
 +
  print STDERR "Writing " . length( $content ) . " bytes to $outFile\n";
 +
  open( FILE, ">" . shift ); print FILE shift; close FILE;  
 
}
 
}
</source>
 
  
Traffic Server must be configured to allow a PURGE request from MediaWiki, instructing the cache to discard stored copies of pages which have been modified by user edits. These requests normally originate only from wiki servers within the local site.
+
#
 +
# Set up context.  The O/P dir can be specified but defaults to /etc/trafficserver
 +
#
  
If the page is not in the cache, a 200 (success) code is still returned as the objective is to remove the outdated page from the cache.
+
my $ATS  = "/etc/trafficserver";
 +
my $outDir = ( $#ARGV > -1 ) ? $ARGV[0] : $ATS;
  
<source lang="c">
+
die "This procudure must be run from root to write to /etc\n" if $UID > 0 and $outDir =~ m!^/etc!;
# Called if the cache has a copy of the page.
+
my $publicServer = ( hostname == 'ooowikivm' ) ? 'ooowikiv.home' : 'wiki-ooo.apache.org';
sub vcl_hit {
+
my $wikiServer  = 'localhost';
        if (req.request == "PURGE")  
+
            {purge_url(req.url);
+
            error 200 "Purged";}
+
  
        if (!obj.cacheable)
+
#
          {pass;}
+
# Add mapping rules to remap.conf
}
+
#
 +
my $remap = qx(cat $ATS/remap.config.default) .
 +
              "map          http://$publicServer/ http://$wikiServer/\n" .
 +
              "reverse_map  http://$wikiServer/ http://$publicServer/\n";
 +
writeFile "$outDir/remap.config", $remap;
  
# Called if the cache does not have a copy of the page.
+
#
sub vcl_miss {
+
# Create storage.conf
        if (req.request == "PURGE")  
+
#
          {error 200 "Not in cache";}
+
my( $rawDevice, $diskSize ) = ( "/dev/raw/raw1", 2*1024*1024*1024 ); # 2Gb
}
+
writeFile "$outDir/storage.config", "$rawDevice $diskSize\n";
</source>
+
The web server may set default expiry times for various objects. As MediaWiki will indicate (via a PURGE request) when a page has been edited and therefore needs to be discarded from cache, the Apache-reported defaults for expiry time are best ignored or replaced with a significantly-longer expiry time.
+
  
Pages served to logged-in users (identified by MediaWiki setting browser cookies) or which require passwords to access are never cached.
+
my $cfg = new Apache::TS::Config::Records(file => "$ATS/records.config.default");
 +
 +
$cfg->set( conf => "proxy.config.exec_thread.autoconfig",                  val => "0"    );
 +
$cfg->set( conf => "proxy.config.exec_thread.limit",                        val => "2"    );
  
In this example, the 'no-cache' flag is being ignored on pages served to anonymous-IP users. Such measures normally are only needed if a wiki is making extensive use of extensions which add this flag indiscriminately (such as a wiki packed with [[MWwiki:Extension:RandomSelection|random &lt;choose&gt;/&lt;option&gt; Algorithm]] tags on the main page and various often-used templates).
+
$cfg->set( conf => "proxy.config.reverse_proxy.enabled",                    val => "1"    );
  
<source lang="c">
+
$cfg->set( conf => "proxy.config.cache.ram_cache.size",                    val => "64M"  );
# Called after a document has been successfully retrieved from the backend.
+
$cfg->set( conf => "proxy.config.cache.ram_cache_cutoff",                  val => "512K"  );
sub vcl_fetch {
+
  
        # set minimum timeouts to auto-discard stored objects
+
$cfg->set( conf => "proxy.config.cache.ram_cache.compress",                val => "2"    );
#      set beresp.prefetch = -30s;
+
$cfg->set( conf => "proxy.config.cache.threads_per_disk",                  val => "4"    );
        set beresp.grace = 120s;
+
  
        if (beresp.ttl < 48h) {
+
$cfg->set( conf => "proxy.config.url_remap.remap_required",                val => "1"    );
          set beresp.ttl = 48h;}
+
$cfg->set( conf => "proxy.config.url_remap.pristine_host_hdr",              val => "0"    );
  
        if (!beresp.cacheable)  
+
$cfg->set( conf => "proxy.config.http.server_port",                         val => "80"    );
             {pass;}
+
$cfg->set( conf => "proxy.config.http.insert_response_via_str",             val => "1"    );
 +
$cfg->set( conf => "proxy.config.http.accept_no_activity_timeout",          val => "30"    );
 +
$cfg->set( conf => "proxy.config.http.keep_alive_no_activity_timeout_out",  val => "5"    );
 +
$cfg->set( conf => "proxy.config.http.negative_caching_enabled",            val => "1"    );
 +
$cfg->set( conf => "proxy.config.http.negative_caching_lifetime",          val => "240"  );
 +
$cfg->set( conf => "proxy.config.http.normalize_ae_gzip",                  val => "1"    );
 +
$cfg->set( conf => "proxy.config.http.server_max_connections",              val =>"100" );
  
        if (beresp.http.Set-Cookie)  
+
$cfg->set( conf => "proxy.config.dns.search_default_domains",              val => "0"    );
            {pass;}
+
  
#      if (beresp.http.Cache-Control ~ "(private|no-cache|no-store)")  
+
$cfg->set( conf => "proxy.config.hostdb.size",                              val => "1000" );
#          {pass;}
+
$cfg->set( conf => "proxy.config.hostdb.storage_size",                      val => "2M"  );
  
        if (req.http.Authorization && !beresp.http.Cache-Control ~ "public")  
+
$cfg->set( conf => "proxy.config.ssl.enabled",                              val => "0"    );
            {pass;}
+
$cfg->set( conf => "proxy.config.ssl.number.threads",                      val => "0"    );
 +
 +
$cfg->set( conf => "proxy.config.http_ui_enabled",                          val => "3"  );
  
}
+
#$cfg->append( line => "CONFIG proxy.config.http.enable_http_info INT 1" );
 +
#$cfg->set( conf => "proxy.config.mlock_enabled", val => "2" );
 +
$cfg->write( file => "$outDir/records.config" );
 
</source>
 
</source>
  
==Configuring MediaWiki==
+
== Configuring MediaWiki ==
Since Traffic Server is doing the requests from localhost, Apache will receive "127.0.0.1" as the direct remote address. However, as Traffic Server forwards requests to Apache, it is configured to add the "X-Forwarded-For" header so that the remote address from the outside world is preserved. MediaWiki must be configured to use the "X-Forwarded-For" header in order to correctly display user addresses in [[MWwiki:special:recentchanges]].
+
  
The required configuration is the same for Squid as for Traffic Server. Make sure the LocalSettings.php file contains the following lines:
+
Since Traffic Server is captures the end-user browser requests and forwards those which requre processing by Apache through the localhost loopback connector, Apache will alsways receive "127.0.0.1" as the direct remote address. However, as Traffic Server forwards requests to Apache, it is configured to add the "X-Forwarded-For" header so that the remote address from the outside world is preserved. MediaWiki must be configured to use the "X-Forwarded-For" header in order to correctly display user addresses in '''Special:RecentChanges'''.
 +
 
 +
The required configuration for Traffic Server is essentially the same as for Squid, with the following config assignments in '''LocalSettings.php''':  
  
 
<source lang="php">
 
<source lang="php">
 
$wgUseSquid = true;
 
$wgUseSquid = true;
$wgSquidServers = array('example.org:80');
+
$wgSquidServers = array('127.0.0.1');
//Use $wgSquidServersNoPurge if you don't want MediaWiki to purge modified pages
+
// $wgInternalServer = '';          // Internal server name as known to Squid. NOT SET.
//$wgSquidServersNoPurge = array('127.0.0.1');
+
// $wgMaxSquidPurgeTitles = 0       // Maximum no of pages to purge in one client operation. NOT SET.
</source>
+
// $wgSquidMaxage =  Cache timeout for the squid.
 +
$wgUseXVO = true;                   // Send X-Vary-Options header for better caching.
 +
$wgDisableCounters = true;          // Disable collection of Page counters
 +
$wgShowIPinHeader = false;          // Disable display of IP for guests as this frustrates caching
 +
</source>  
  
Be sure to replace 'example.org' with the IP address on which your Traffic Server cache is listening. These settings serve two purposes:
+
These settings serve two main purposes:  
* If a request is received from the Traffic Server cache server, the MediaWiki logs need to display the IP address of the user, not that of Traffic Server. A [[MWwiki:special:recentchanges]] in which every edit is reported as '127.0.0.1' is all but useless; listing that address as a Squid/Traffic Server server tells MediaWiki to ignore the IP address and instead look at the 'x-forwarded-for' header for the user's IP.
+
* If a page or image is changed on the wiki, MediaWiki will send notification to every server listed in [[MWmanual::$wgSquidServers|$wgSquidServers]] telling it to discard (purge) the outdated stored page.
+
  
See also [[MWmanual::Configuration settings#Squid|Squid configuration settings]] for all settings related to Squid/Traffic Server caching.
+
*If a request is received from the Traffic Server cache server, the MediaWiki logs need to display the IP address of the user, not that of Traffic Server. A '''Special:RecentChanges''' in which every edit is reported as '127.0.0.1' isn't meaningful. Listing this address in '''$wgSquidServers''' lets the application know that the user IP address should be obtained from the 'x-forwarded-for' header.
 +
*Whenever a page or file is modified on the wiki, MediaWiki must be configured to send Purge notification to any caches which serve its content. '''$wgSquidServers''' contains the list of such servers. (The name is misleading. Squid was the first cache supported by MediaWiki.)
  
===Some notes===
+
Note that the configuration is already tuned to support PHP APC acceration for both MediaWiki code, and metadata caching.  
As most of the traffic is handled by the Traffic Server cache, a statistics package<ref>[http://awstats.sourceforge.net/ AWStats]</ref> will not give meaningful data if configured to analyse Apache's access_log. There are packages available to log Traffic Server access data to a file for analysis if needed. Counters on individual wiki pages will also severely underestimate the number of views to each page (and to the site overall) if a web cache is deployed. Many large sites will turn off the counters with [[MWmanual::$wgDisableCounters|$wgDisableCounters]].
+
  
The display of the user's IP address in the user interface must also be disabled by setting <tt>[[MWmanual::$wgShowIPinHeader|$wgShowIPinHeader]] = false;</tt>
+
=== Outstanding issues  ===
  
Note that Traffic Server is an alternative to Squid, but does not replace other portions of a complete MediaWiki [[MWmanual:caching strategy|caching strategy]] such as:
+
*'''Logging and Page Stats'''. Most inbound request will be handled by the Traffic Server cache, so the internal stats collected by MediaWiki will only reflect cache misses. We need to think about how we handle logfile analysis and stats in general. I have turned off page counters as these will only reflect cache misses in future.
  
;Pre-compiled PHP code: The default behaviour of PHP under Apache is to load and interpret PHP web scripts each time they are accessed. Installation of a cache such as APC (<tt>yum install php-pecl-apc</tt>, then allocate memory by setting <tt>apc.shm_size=128</tt> or better in <tt>/etc/php.d/apc.ini</tt>) can greatly reduce the amount of CPU time required by Apache to serve PHP content.
+
*'''Decision to retain a version MediaWiki 1.15 baseline'''. For MediaWiki v1.16.x and later, internationalisation can add a material D/B load, For this and other schema changes, we've decided to stick with the last stable MW 1.15.x version (1.15.6) as the S/W baseline
;Localisation/Internationalisation: By default, MediaWiki (as of version 1.16+) will create a huge <tt>l10n_cache</tt> database table and access it constantly - possibly more than doubling the load on the database server after an "upgrade" to the latest MediaWiki version. Set [[MWmanual::$wgLocalisationCacheConf|$wgLocalisationCacheConf]] to force the localisation information to be stored to the file system to remedy this.
+
;Variables and session data: Storing variable data such as the MediaWiki sidebar, the list of [[MWwiki:Extension:SpecialNamespaces|namespaces]] or the [[MWwiki:Extension:SpamBlacklist|spam blacklist]] to a memory cache will substantially increase the speed of a MediaWiki installation. Forcing user login data to be stored in a common location is also essential to any installation in which multiple, interchangeable Apache servers are hidden behind the same Traffic Server caches to serve pages for the same wikis. Install the memcached package and set the following options in [[MWmanual::LocalSettings.php|LocalSettings.php]] to force both user login information and cached variables to use memcache:
+
::<tt>[[MWmanual::$wgMainCacheType|$wgMainCacheType]] = CACHE_MEMCACHED;</tt>
+
::<tt>[[MWmanual::$wgMemCachedServers|$wgMemCachedServers]] = array ( '127.0.0.1:11211' );</tt>
+
::<tt>[[MWmanual::$wgSessionsInMemcached|$wgSessionsInMemcached]] = true;</tt>
+
::<tt>[[MWmanual::$wgUseMemCached|$wgUseMemCached]] = true;</tt>
+
:Note that, if you have multiple servers, the [[MWwiki:w:localhost|localhost]] address needs to be replaced with that of the shared memcached server(s), which must be the same for all of the matching web servers at your site. This ensures that logging a user into one server in the cluster logs them into the wiki on all the interchangeable web servers.
+
 
+
In many cases, there are multiple alternative caching approaches which will produce the same result. See [[MWmanual::Cache]].
+
  
 
== Apache configuration  ==
 
== Apache configuration  ==
Line 240: Line 203:
 
The Apache web server default logging format would only list 127.0.0.1 as the connecting address. Hence and extra "cached" logging option is enabled<ref>http://httpd.apache.org/docs-2.2/mod/mod_log_config.html</ref>, and this captures the originating browser's address by using the "x-forwarded-for" header passed by Traffic Server.  
 
The Apache web server default logging format would only list 127.0.0.1 as the connecting address. Hence and extra "cached" logging option is enabled<ref>http://httpd.apache.org/docs-2.2/mod/mod_log_config.html</ref>, and this captures the originating browser's address by using the "x-forwarded-for" header passed by Traffic Server.  
  
<tt>LogFormat "%{X-Forwarded-for}i&nbsp;%l&nbsp;%u&nbsp;%t \"%r\"&nbsp;%&gt;s&nbsp;%b \"%{Referer}i\" \"%{User-Agent}i\"" cached CustomLog /var/log/apache2/access.log cached</tt>
+
<tt>LogFormat "%{X-Forwarded-for}i&nbsp;%l&nbsp;%u&nbsp;%t \"%r\"&nbsp;%&gt;s&nbsp;%b \"%{Referer}i\" \"%{User-Agent}i\"" cached <br>CustomLog /var/log/apache2/access.log cached</tt>  
 +
 
 +
== See also  ==
 +
 
 +
*[[MWmanual:Cache|Cache]]
 +
*[[MWmanual:Squid caching|Squid caching]]
 +
 
 +
== References  ==
  
== See also ==
+
<references />
* [[MWmanual::Cache]]
+
* [[MWmanual::Squid caching]]
+
  
== References ==
+
{{ASFcopyright}}
<references />
+

Latest revision as of 10:15, 23 August 2011

Why Traffic Server?

Apache Traffic Server is a lightweight, yet high-performance, web proxy cache that improves network efficiency and performance[1] . Like Squid and Varnish, Traffic Server can be configured as a reverse proxy[2]. In this mode, it acts as a full surrogate for the back-end wiki with port 80 on the advertised hostname for the wiki resolving to Traffic Server. In doing so this enables the processing of web requests to be offloaded from the PHP and database intensive MediaWiki application.

Traffic Server can be configure to store high frequency cached content in memory, and where content is flush to disk, access will still invovle significantly less physical I/O than the MediaWiki application. Hence permitting a significantly higher throughput for a give CPU and I/O resource constraint. MediaWiki id been designed to integrate closely with such web cache packages and will Traffic Server when a page should be purged from the cache in order to be regenerated. From MediaWiki's point of view, a correctly-configured Traffic Server installation is interchangeable with Squid or Varnish.

The architecture

An example setup of Traffic Server, Apache and MediaWiki on a single server is outlined below. A more complex caching strategy may use multiple web servers behind the same Traffic Server caches (all of which can be made to appear to be a single host) or use independent servers to deliver wiki or image content.

Outside world <--->

Server

Traffic Server accelerator
w.x.y.z:80

<--->

Apache webserver
127.0.0.1:80

To the outside world, Traffic Server appears to act as the web server. In reality it passes on requests to the Apache web server, but only when necessary. An Apache running on the same server only listens to requests from localhost (127.0.0.1) while Traffic Server only listens to requests on the server's external IP address. Both services run on port 80 without conflict as each is bound to different IP addresses.

Traffic Server 3.0.1

Installation and preparation

The ATS README and INSTALL define the Ubuntu package dependencies, so before doing the build the following packages were installed:

sudo apt-get install  autoconf  automake libtool g++ libssl-dev tcl-dev expat libexpat-dev libpcre3-dev libcap-dev

and the system group / user was added for an ats-data account much as Apache server on Debian is configured to use www-data. The package was then installed by executing the following from the kit directory. The configure options mean that the kit is installed with a standard Debian layout and using the ats-service account:

./configure --enable-layout=Debian --with-user=ats-data --with-group=ats-data
make
sudo make install

ATS uses both a memory and a disk cache to buffer content hierarchically. Clearly, memory is used for high frequency content but it is still important that any physical I/O overheads for accession disk-based content are kept to an absolute minimum. Like many database packages, ATS recommends the use of a RAW partition for this disk cache and this needs to be configured. As we use LVM2 on the VM, it is easy to set up the partition

lvcreate -L 2G -C y -n ooo-wiki-TScache  ooo-wiki-data-lvgroup      # Create the ATS cache LV
modprobe raw                                                        # Load the raw device driver
sudo bash -c "echo raw >> /etc/modules"                             # And make sure that it is loaded on reboot
sudo raw /dev/raw/raw1 /dev/ooo-wiki-data-lvgroup/ooo-wiki-TScache  # Map the raw1 device to the LV
sudo chmod 660 /dev/raw/raw1                                        #   then change RW access 
sudo chown ats-data:ats-data /dev/raw/raw1                          #   to ats-data

However, automatically adding this udev enumeration of an LVM mapped raw device ra is complex and can only be done by changing existing /lib/udev generators, so I recommend a KISS approach for now. These have to be executed prior to starting ATS, and as ATS uses a traditional System V init script rather than upstart, I have also adopted this convention and created a tiny trafficserver-raw script to do this. I leave this to an exercise for the reader. One this has been done, we need to hook ATS into the rc startup system:

sudo cp $WHATEVERYOURWOKINGDIRIS/trafficserver-raw /etc/init.d     # Hook the ATS raw helper start script into init.d
sudo update-rc.d trafficserver-raw start 91 2 3 4 5 .              # but is only started 
sudo ln -s /usr/bin/trafficserver /etc/init.d/trafficserver        # Hook the trafficserver start/stop script 
sudo update-rc.d trafficserver start 93 2 3 4 5 . stop 07 0 1 6 .  # ATS is started after Apache and the raw helper
 
sudo mkdir /x1/var_log_trafficserver                               # You need to set up the log directory.  ATS doesn't do this
sudo chown ats-data:ats-data /x1/var_log_trafficserver             # And it needs to be owned by ats-data or the logging will fail
sudo ln -s /x1/var_log_trafficserver /var/log/trafficserver        # Link it into the Debian standard Logging directory structure

Configuration

The ATS Administrator's Guide discusses two simple methods of defining the configuration.[3]. However, the package also provides Perl modules to facilitate configuration for those admins familiar with using Perl, so I have used this. The baseline configuration is as folllows:

#
# In this configuration, Apache Traffic Servicer (ATS) is configured as an HTTP reverse proxy 
# connected via the IP looback connector (127.0.0.1) to an in-VM Apache service which is running a
# dedicated MediaWiki application to serve the OOo wiki. The VM is single-core with the application
# typically services ~5-30 page requests per minute, plus associated image/CSS/JS requests needed
# to render each page. For this type of configuration/load, the default ATS requires few changes.   
#
# As ATS provides a Perl Module to facilitate such changes, I have used this, and only four config
# files need additions / changes.
#
BEGIN {push @INC, "/x1/wiki-kits/trafficserver-3.0.1/contrib/perl/ConfigMgmt/lib"; }
 
use strict;
use English;
use Sys::Hostname;
use Apache::TS::Config::Records;
sub writeFile ($$) { 
  my( $outFile, $content ) = @_;
  print STDERR "Writing " . length( $content ) . " bytes to $outFile\n"; 
  open( FILE, ">" . shift ); print FILE shift; close FILE; 
}
 
#
# Set up context.  The O/P dir can be specified but defaults to /etc/trafficserver
#
 
my $ATS  = "/etc/trafficserver";
my $outDir = ( $#ARGV > -1 ) ? $ARGV[0] : $ATS;
 
die "This procudure must be run from root to write to /etc\n" if $UID > 0 and $outDir =~ m!^/etc!;
my $publicServer = ( hostname == 'ooowikivm' ) ? 'ooowikiv.home' : 'wiki-ooo.apache.org';  
my $wikiServer   = 'localhost';
 
#
# Add mapping rules to remap.conf
#
my $remap = qx(cat $ATS/remap.config.default) . 
              "map          http://$publicServer/ http://$wikiServer/\n" .
              "reverse_map  http://$wikiServer/ http://$publicServer/\n";
writeFile "$outDir/remap.config", $remap;
 
#
# Create storage.conf
#
my( $rawDevice, $diskSize ) = ( "/dev/raw/raw1", 2*1024*1024*1024 );  # 2Gb
writeFile "$outDir/storage.config", "$rawDevice $diskSize\n";
 
my $cfg = new Apache::TS::Config::Records(file => "$ATS/records.config.default");
 
$cfg->set( conf => "proxy.config.exec_thread.autoconfig",                   val => "0"     );
$cfg->set( conf => "proxy.config.exec_thread.limit",                        val => "2"     );
 
$cfg->set( conf => "proxy.config.reverse_proxy.enabled",                    val => "1"     );
 
$cfg->set( conf => "proxy.config.cache.ram_cache.size",                     val => "64M"   );
$cfg->set( conf => "proxy.config.cache.ram_cache_cutoff",                   val => "512K"  );
 
$cfg->set( conf => "proxy.config.cache.ram_cache.compress",                 val => "2"     );
$cfg->set( conf => "proxy.config.cache.threads_per_disk",                   val => "4"     );
 
$cfg->set( conf => "proxy.config.url_remap.remap_required",                 val => "1"     );
$cfg->set( conf => "proxy.config.url_remap.pristine_host_hdr",              val => "0"     );
 
$cfg->set( conf => "proxy.config.http.server_port",	                        val => "80"    );
$cfg->set( conf => "proxy.config.http.insert_response_via_str",             val => "1"     );
$cfg->set( conf => "proxy.config.http.accept_no_activity_timeout",          val => "30"    );
$cfg->set( conf => "proxy.config.http.keep_alive_no_activity_timeout_out",  val => "5"     );
$cfg->set( conf => "proxy.config.http.negative_caching_enabled",            val => "1"     );
$cfg->set( conf => "proxy.config.http.negative_caching_lifetime",           val => "240"   );
$cfg->set( conf => "proxy.config.http.normalize_ae_gzip",                   val => "1"     );
$cfg->set( conf => "proxy.config.http.server_max_connections",              val =>"100" );
 
$cfg->set( conf => "proxy.config.dns.search_default_domains",               val => "0"     );
 
$cfg->set( conf => "proxy.config.hostdb.size",                              val => "1000" );
$cfg->set( conf => "proxy.config.hostdb.storage_size",                      val => "2M"  );
 
$cfg->set( conf => "proxy.config.ssl.enabled",                              val => "0"     );
$cfg->set( conf => "proxy.config.ssl.number.threads",                       val => "0"     );
 
$cfg->set( conf => "proxy.config.http_ui_enabled",                          val => "3"  );
 
#$cfg->append( line => "CONFIG proxy.config.http.enable_http_info INT 1" );
#$cfg->set( conf => "proxy.config.mlock_enabled", val => "2" ); 
$cfg->write( file => "$outDir/records.config" );

Configuring MediaWiki

Since Traffic Server is captures the end-user browser requests and forwards those which requre processing by Apache through the localhost loopback connector, Apache will alsways receive "127.0.0.1" as the direct remote address. However, as Traffic Server forwards requests to Apache, it is configured to add the "X-Forwarded-For" header so that the remote address from the outside world is preserved. MediaWiki must be configured to use the "X-Forwarded-For" header in order to correctly display user addresses in Special:RecentChanges.

The required configuration for Traffic Server is essentially the same as for Squid, with the following config assignments in LocalSettings.php:

$wgUseSquid = true;
$wgSquidServers = array('127.0.0.1');
// $wgInternalServer = '';           // Internal server name as known to Squid. NOT SET.
// $wgMaxSquidPurgeTitles = 0        // Maximum no of pages to purge in one client operation. NOT SET.
// $wgSquidMaxage =  Cache timeout for the squid.
$wgUseXVO = true;                    // Send X-Vary-Options header for better caching.
$wgDisableCounters = true;           // Disable collection of Page counters
$wgShowIPinHeader = false;           // Disable display of IP for guests as this frustrates caching

These settings serve two main purposes:

  • If a request is received from the Traffic Server cache server, the MediaWiki logs need to display the IP address of the user, not that of Traffic Server. A Special:RecentChanges in which every edit is reported as '127.0.0.1' isn't meaningful. Listing this address in $wgSquidServers lets the application know that the user IP address should be obtained from the 'x-forwarded-for' header.
  • Whenever a page or file is modified on the wiki, MediaWiki must be configured to send Purge notification to any caches which serve its content. $wgSquidServers contains the list of such servers. (The name is misleading. Squid was the first cache supported by MediaWiki.)

Note that the configuration is already tuned to support PHP APC acceration for both MediaWiki code, and metadata caching.

Outstanding issues

  • Logging and Page Stats. Most inbound request will be handled by the Traffic Server cache, so the internal stats collected by MediaWiki will only reflect cache misses. We need to think about how we handle logfile analysis and stats in general. I have turned off page counters as these will only reflect cache misses in future.
  • Decision to retain a version MediaWiki 1.15 baseline. For MediaWiki v1.16.x and later, internationalisation can add a material D/B load, For this and other schema changes, we've decided to stick with the last stable MW 1.15.x version (1.15.6) as the S/W baseline

Apache configuration

The Apache server is configured to listen on the standard port at the localhost IP, and accepts all requests from Traffic Server:

Listen 127.0.0.1:80

The Apache web server default logging format would only list 127.0.0.1 as the connecting address. Hence and extra "cached" logging option is enabled[4], and this captures the originating browser's address by using the "x-forwarded-for" header passed by Traffic Server.

LogFormat "%{X-Forwarded-for}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" cached
CustomLog /var/log/apache2/access.log cached

See also

References

  1. http://trafficserver.apache.org/
  2. http://trafficserver.apache.org/docs/v2/admin/reverse.htm
  3. http://trafficserver.apache.org/docs/v2/admin/configure.htm
  4. http://httpd.apache.org/docs-2.2/mod/mod_log_config.html

Copyright © 2011 The Apache Software Foundation. Licensed under the Apache License, Version 2.0.
Apache Traffic Server, Apache, the Apache Traffic Server logo, and the Apache feather logo are trademarks of The Apache Software Foundation.


Personal tools