There is so much knowledge about the Zend Framework, but so few outlets for it, that the most obvious golden nuggets sit around in peoples’ minds forever untapped. In writing the Performance Optimisation chapter draft for Zend Framework: Surviving The Deep End, I recognised that it was a huge appendix – and nowhere near complete. At around 6500 words, I had to reign myself in before it doubled in size. Well, actually it DID double in size – but I kept cutting out the less immediately relevant bits for another day. Here’s one of those bits. Why I keep calling it a “bit”, I don’t know…

In this article I explore one particular topic to be added to that appendix (a mini-book by itself if this rate keeps up) – the concept of Page Caching. Briefly mentioned in the book’s appendix this is one of the most powerful and overlooked optimisations utilising caching. There are still blogs and forums that seem immune to it.

Before I’m busted, these are not new ideas. It’s likely they are familiar to many programmers, so the article is more of an effort to demonstrate how they can be implemented within the confines of the Zend Framework in a more conveniently managed way, along with some facets pulled from other frameworks I’ve always liked.

What Is Page Caching?

Page Caching is the process of caching entire generated HTML documents for a period of time so that the expensive task of dynamically generating them is avoided for that period. Done correctly, it should completely bypass the Zend Framework MVC stack. This can net you incredible results! When developing the mini-application (it’s really tiny) for presenting Zend Framework: Surviving The Deep End online, implementing Page Caching via Zend_Cache resulted in a 10 fold increase in requests per second from my Slicehost VPS. This is the kind of optimisation you should be fantasising about :) .

Page Caching is great, but it’s only applicable if the output from the application URL the cache is generated from experiences detectable or predictable periodic updates. What this means, is that once we cache a page we only want to invalidate and clear that cache when the data driving the dynamic portions of the cached page change. If change is readily detectable or follows a regular periodic update pattern, this can be accomplished quite handily. However, if changes have a high frequency, or depend on unpredictable factors, or requires authentication, then page caching often loses its benefits.

One example of a page you can’t cache to a single location, is one where user specific details are displayed. Since this invariable changes depending on who’s requesting the same URL, caching it as a single page is unrealistic – unless you are abolutely obsessed and cache on a per user basis! Another example is authentication where page access needs to be resticted – then the cache can only be used after authentication which is less effective since you still need to tap into the application.

These examples often prevent whole page caching, however this does not mean you can’t use partial caching – creating a system where the page is aggregated from both dynamically rendered HTML, and cached HTML. However, partial caching does necessitate hitting the Zend Framework MVC stack which is perceptibly slower than page caching which should bypass the MVC stack completely.

Simple Page Caching Example

The simplest form of page caching would utilise Zend_Cache and the Zend_Cache_Frontend_Page frontend. In this example, I’ve elected to cache pages into memory using APC. If you can spare the RAM, caching in memory is a lot better than caching to files using this method. In fact any caching to memory is likely better than using files given the speed of memory access compared to file operations. You can switch to file or other caching if you prefer.

Since the goal is to bypass the Zend Framework completely to make the page request as fast as possible, the page cache is implemented in a Bootstrap class in a run() method just before the Bootstrap commences an MVC request cycle. When a cache “hit” is detected, this will serve the cached HTML from the selected cache backend and prevent the Front Controller from being run. If no “hit” is detected, the application is called upon as usual and its output recorded and cached for future requests. To control the cache’s lifetime, you can assign a Time To Live (TTL) value or you can manually purge the cache when any Model dependency is altered.

In tests on a VPS, I managed to get from 35 requests per second to over 330 requests per second for pages which were cached to APC using this method. The main bottleneck on my VPS is the CPU so everything is cached to memory since there’s nearly always spare RAM sitting around idle.

Here’s an example of a page cache implemented in a Bootstrap class.

[geshi lang=php]class ZFExt_Bootstrap
{

// …

public function run()
{
$this->setupEnvironment();
// Implement Page Caching at Bootstrap level before any
// MVC operations so these operations can be completely
// avoided when a valid cache exists
$this->usePageCache();
// If a valid cache exists, execution exits!
$this->prepare();
$response = self::$frontController->dispatch();
$this->sendResponse($response);
}

public function usePageCache()
{
$frontendOptions = array(
// cache for 1 hour
‘lifetime’ => 3600,
// Disable caching by default for all URLs
‘default_options’ => array(
‘cache’ => false
),
// Only cache URLs for Index and News controllers
// matching the following patterns
‘regexps’ => array(
‘^/$’ => array(‘cache’ => true),
‘^/index/’ => array(‘cache’ => true),
‘^/news/’ => array(‘cache’ => true),
‘^/blog/tags/’ => array(‘cache’ => true)
)
);
// Note: APC backend has no options!
$cache = Zend_Cache::factory(
‘Page’,
‘Apc’,
$frontendOptions
);
// Serve cached page (if it exists) and exit
// otherwise cache all output after this point
// assuming caching is enabled for the current URL
$cache->start();
}
}[/geshi]

If you want to eke out a little more speed, you can skip the Bootstrap and just implement the cache in your index.php file so any overhead from using the Boostrap class is avoided.

Why PHP Dependent Page Caching Sometimes Sucks

In testing on one of my Virtual Private Servers with Slicehost, simple page caching to memory using Zend_Cache nets me a 9-10 fold increase in requests per second. Nearly all of that benefit lies in avoiding the Zend Framework MVC stack.

If your application has scaled beyond a handful of servers, this is probably the end of the road for you, but if you are still hosted on a single server (or a small scaled solution where files are maintained on one server) you can push the boundaries of page caching even further. The rest of this article concerns this scenario.

A lot of the reason why the ZF’s default page caching sucks at times is that it requires PHP. Some people might have warm and fuzzy feelings but in the game of getting the most out of any server, Apache is a loose cannon. It uses a lot of memory and CPU and that gets worse when PHP is called upon.

The next optimisation level therefore is avoiding PHP and/or Apache completely. If we can remove PHP from the equation, our Apache processes will use less memory. If we can remove even Apache from the equation we save even more. By relying on PHP to retrieve cached pages, neither of these is possible without some drastic change.

On a related note, one growing strategy to avoid Apache is to offload requests for static resources (images, css, etc.) onto a more efficient HTTP server which uses less memory and CPU than Apache. Only requests needing PHP would then go through Apache. Apache is already fast, but it’s not the fastest! Two common choices for alternative HTTP servers are lighttpd and nginx.

This strategy is often called a “reverse proxy”. It involves reconfiguring Apache to operate as a backend HTTP server (by making it listen to a port other than 80) which exists to service requests that need PHP. All other requests are serviced by a frontend HTTP server listening on port 80, like nginx, which can serve non-dynamic static resources like images, javascript, css and so on, at incredible rates using minimal memory and CPU. Since nginx is the frontend HTTP server, whenever it detects a dynamic request requiring the use of PHP it will proxy that request to Apache which is listening on an alternative port. All of this keeps Apache usage to a minimum. While this may appear to be a complex idea, in practice it’s ludicrously simple to implement. I highly suggest locating a tutorial for setting up an nginx reverse proxy and giving it a trial run – it’s well worth the effort.

Back on track, avoiding PHP would mean that we can’t access the Zend_Cache version of page caches, since any page cache retrieval would need PHP. So instead, we’ll up the ante and instead cache pages as static HTML files which Apache (or nginx in a reverse proxy setup) can serve directly without invoking PHP.

Note: This is not very scalable since it requires file manipulation. If you are scaling beyond one or two servers, the current page caching may have to do. However for single servers you’ll see the advantages.

How much of a difference will static page caching have compared to in-memory page caching using APC and Zend_Cache? I did a few quick benchmarks using my nginx reverse proxy and came up with the following from a moderate 10,000 request ApacheBench run with 100 concurrent users repeated three times and averaged. The tests use a simple PHP file which echo’s “Hello World” and exits, and a static text file containing the same.

Apache (serving PHP file) : 711.20 requests/sec
Apache (serving static file) : 2408.55 requests/sec [+338% vs PHP]
Nginx (serving static file) : 4208.52 requests/sec [+591% vs PHP, +175% vs Apache]

In other words, screw Apache :) . Fluctuations between hardware, configurations and ApacheBench aside, nginx outperforms Apache by quite a margin in a reverse proxy setup where static files are favoured over PHP. If anyone benches differently comment your stats – I’m throwing these out from a system optimised for nginx so they aren’t gospel, and the margin is likely inflated a bit as a result :) .

If you translate this to a Zend Framework application applying page caching as is (using PHP on every request), well, you get the picture. You CAN do better! How?!

Static HTML Caching

Static File Caching is a solution whereby the output from applications is cached as a static HTML/XML/RSS/JSON file. The first request hits the application, but the next will serve up the static file and never even take a peep at PHP. This does, obviously, have a downside! Yes, it’s zipping along at 4000+ requests per second but if our application is completely bypassed, how are we going to manage the static file cache? Where will it be cleared, invalidated, and replaced when the data it’s generated from is updated? Set that thought aside for the moment and we’ll solve one problem at a time.

Static file caching is not implemented in the Zend Framework so we need to do a little legwork and create a custom backend to support it.

The goal of our custom backend is to create a static HTML file containing the application output we want to cache. It gets a little more complex though, the static file needs to be stored in such a way that it’s location mirrors the URL being requested. We also need to ensure that adding static file caching (essentially .html, .xml, etc. files) does not force us to change the routing we’ve configured if possible. Unfortunately that’s not always avoidable, and a prime problem here are pages requested with parameters (GET query strings, or POST data). For this reason, static file caching encourages adding query parameters into the URL’s path section, even for POSTs where it makes some sense.

Here we start meeting some of the limitations of static HTML caching – it works well for straight forward URLs, but throw in query strings or POST data and the static caching will need to be replaced with the current Zend_Cache page caching which relies on PHP for every request.

However, I’m going to keep this example simple and use a URL with parameters as would be typical of any Zend Framework application. Consider the following blog URL:

/blog/tags/zend-framework

We’ll assume this route points to a Controller which triggers the display of some blog posts which have been tagged “zend-framework”. Keeping with simplicity, we’re ignoring pagination (hint: make the page number an extra URL parameter and it works). To enable static HTML caching (again, XML and others need their own magic so we’ll stay with HTML for now), we’ll need to do a little wizardry so Apache will map this URL to:

/blog/tags/html/zend-framework.html

At first anyway – once Apache notices there is no such HTML file it should go back to the original URL and map it onto index.php as usual. To accomplish this requires a small change to our Rewrite Rules. Based on the recommended rewrite rules for Zend Framework, here’s the new version:

RewriteEngine On

RewriteRule ^/(.*)/$ /$1 [QSA]
RewriteRule ^$ html/index.html [QSA]
RewriteRule ^([^.]+)/$ html/$1.html [QSA]
RewriteRule ^([^.]+)$ html/$1.html [QSA]

RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d

RewriteRule ^.*$ – [NC,L]
RewriteRule ^.*$ /index.php [NC,L]

In case you were wondering, the extra reference to a “html” directory is the location relative to /public where we will cache static HTML. Keeping it separated from /public will enable us to clear the entire cache in one go if the need arises, and generally just keeps the /public directory a bit tidier. More >