Oct 6, 2009

PHP proxy for static assets

Last several days we were implementing mod_expires + mod_headers emulation on PHP for those hosts which don't have these modules installed. The idea is simple - just forward all static resources (images for now) to PHP script that will set all correct headers (10 years cache) + check conditional cache hit (Etag). In short the maximum caching possibilities with minimal CPU wasted.

Ok. First of all we need to redirect all <img> to such PHP script (wo.static.php). This can be done as:

  • Via mod_rewrite. Just rewrite internally all images with dot (.) at the end (the minimal change in the file name, provided by Web Optimizer core) are redirected to wo.static.php?FILENAME.
  • Or just place raw call to this script instead of initial image (if we don't have mod_rewrite). File names will be ugly but images will be cached.

Protect from hacking

Long time ago, maybe in the last century, there were a lof of injections (NULL-byte for Perl, maybe the same for PHP), that allowed any user to get initial text of any file (i.e. /etc/passwd or script sources with database credentials hardcoded) on he server. We restrict this by checking for extension and discard not supported (=dangerous) files.

Also we can't allow users (=possible hackers) to access any system resources ecept images for this website. So we can just check if filename (its full path) includes document root, and only then we serve the file.

File name and type

To enlarge support from the user agents' side for received content we must:

  1. Add Content-Type header with already computed (and supported!) extension.
  2. Add Content-Disposition header to rewrite actual file name of served static file (from wo.static.php to requested file).

Conditional caching

Back to cache logic. First we need to calculate (and check) file's checksum. Google uses a combination of file's name, time of change and something else. We can't use time of change (Web Optimizer can be used on a cluster of servers), so the most general approach is to provide ETag with checksum (calculated from file's content).

It will play well for small files, but will lead to excessive server side load on large resources (but we are speaking about web page images, they always are less than 150 Kb, usually less than 20 Kb). And we can use light functions for this - crc32. Just to get hash from content which will be changed after any modifications of this file.

Non-conditional caching

After this check (with calcuated hash and no match for it from the browser's side) we can send pair of standard cache headers:

  • header("Cache-Control: public, max-age=" . $timeout);
  • header("Expires: " . gmdate('D, d M Y H:i:s', $_SERVER['REQUEST_TIME'] + $timeout). ' GMT');

One for HTTP/1.1, the other - for HTTP/1.0. And only after this send content of the requested file. The magic is over!

Just to remind: Web Optimizer uses standard web server options (mod_expires) to cache static content. Only if this is disabled it applies all the described algorithm.

No comments:

Post a Comment