Nov 8, 2009

Client Side Caching: basics, automation, challenges

It was a long way to setup correct schema to handle all issues with client side caching. Let's review it step by step.

Cache Basics

It's very simple to cache a single file - we need just to add Cache-Control header (for HTTP/1.1) and Expires header (for HTTP/1.0). Headers look like this:

Expires: Thu, 31 Dec 2019 23:55:55 GMT
Cache-Control: max-age=315360000

Is it so straightforward? Well, almost.

Caching on proxy

Internet consists of different servers. Some of them are common websites. Some of them are transport nodes which just redirect traffic from user to website and vice versa. Providers are interested in traffic decrease. So they turn on caching on their servers to serve common requests from inner network, not to transfer them outside.

How can we affect this caching? In RFC 2616 there is defined a way to do this: Cache-Control: public. So our example will turn to:

Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000, public

Caching automation

But how all this can be applied to all files on the server? Apache has a special module to handle such situations: mod_expires. We can turn it on in Apache configuration with these directives:

ExpiresActive On
ExpiresDefault "access plus 10 years"

This seems to be very easy. What is the trouble?

What must be cached?

As far as almost all resources on a website are static (except maybe HTML documents) we can cache them all. The first approach is to cache all but skip caching for HTML documents. It can be performed this way (in Apache configuration):

ExpiresActive On
ExpiresDefault "access plus 10 years"
<FilesMatch \.(html|xhtml|xml|shtml|phtml|php)$>
ExpiresActive Off
</FilesMatch>

So we cache all except HTML documents. But what if we have dynamic CSS / JS files or images? I.e. thumbnails are generated via PHP, and a number of styles are merged 'on fly'.

ExpiresByType is a magic wand!

We can use another directive to cache all required files by their MIME types (which are usually properly defined on server). This way:

ExpiresActive On
ExpiresByType text/css A315360000

It's generally better because in case of dynamic images with can apply cache headers by their type, not their extension. Well, that's all?

Different challenges

There can be a lot of more or less common issues with this approach:

  • There are not common MIME types for a number of resources (i.e. for font files). So we can apply cache headers only by their extensions with FilesMatch. Because we can't be sure for all server environments it's better to combine both approaches: set headers with FilesMatch and with ExpiresByType.
  • How to define what files must be cached? Web Optimizer has a great number of pre-defined MIME types and extensions to handle most of static resources on the server. Also it has several options to manage caching behavior.
  • If there is no mod_expires on the server we can emulate all caching behavior with a light PHP proxy. In case if there is mod_rewrite we can just redirect all required resources internally to this proxy and get files cached.
  • If there is no both mod_expires and mod_rewrite on the server we need to parse HTML code to replace all calls to static files with their equivalents via PHP proxy. That's all.

Web Optimizer has not only outstanding support for various cache techniques for different environments but also provides a number of ways to force cache reload (if we have resource changed). We will discuss this in the next posts.

No comments:

Post a Comment