Nov 14, 2009

Client Side Caching: proxy servers and forced reload

In the previous blog post we have talked about caching basics. Let's review now how proxies actually work and how can we force cache reload.

Cache reload

The main issue with far future expires headers is that browser doesn't re-request resource but takes it from local cache. So if you have made any changes on your website they won't be visible for all 'old' users (with cached styles and scripts, HTML documents usually aren't cached so aggressively).

So what can we do with this trouble? How can we tell browsers to re-request such resources?

Main cache reload patterns

There are two main patterns to force browsers (user agents) to request the current asset once more.

  • Add to the file name any GET parameter (which should indicate new state of this asset). For example
    styles.css -> styles.css?20091114
  • Change "physical" file name. For example
    styles.css -> styles.v20091114.css

Both approaches change URL of the asset and force browser to re-request it.

Cache reload and proxy servers

As you can see the first approach is simpler than the second. But there a few possible issues with it. First of all some proxy servers doesn't cache URL with GET parameter (i.e. our styles.css?20091114). So if you have a lot of visitors from a network behind one firewall we will serve this asset to each visitor separately, without its caching of a proxy server. This will slow down overall website speed and sometimes this can be critical.

But how can we apply new file name without actual changes on file system? Is there any way to perform this with only change in HTML code? Yes!

Apache rewrite rules

Apache web server has a powerful tool to perform 'hidden' redirects for local file (this is called 'internal redirects'). We can manage the first way with just one predefined rule for all files (in our case it's a set of numbers after .v):

RewriteEngine On
RewriteRule ^(.*)\.v[0-9]+\.css$ $1.css

So all such files will be redirected to their physical equivalents but you can change a part of URL with .v at any time — and browsers will request this asset once more.

Automated cache reload

There are several ways to automate cache reload process for all changed files. As far as Web Optimizer combines all resources into 1 file, it's required to re-check file mtime (time of change) for all files and re-combine all resources.

Issues with re-checking all combined files have been already described last month, so it's not generally good to check them all with every web page visit. We can cache all previous checks into 1 file and check only its mtime. So it's done by default. By default we can check time of change of the only file (CSS or JS one) and add as a GET parameter or as a part of file name.

So this is applied for all such files (that should be cached on a client side) and results in the following:

/cache/website.css?1257927769

or

/cache/website.wo1257927769.css

As you can see there are two timestamps in these CSS files, one goes as a GET parameter, the other — as a part of URL (and with Apache mod_rewrite rule is transformed to /cache/website.css).

Overall schema

So what is overall caching algorithm for the website?

  1. Check if we have combined file. If no — create it.
  2. Check mtime of the combined file. If it's required add mtime to URL (using one of the described ways).
  3. Browser receives HTML code with the URL of combined file.
  4. Browser checks if it has this URL cached. If yes, all finished here.
  5. If not browser requests cached file (which is already prepared on the server or is cached on the proxy).

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. There are different reasons for which we use the internet. Some use it to check their mail box or send a message. Some use it to find friends and chat with their loved ones. read this article

    ReplyDelete