Nov 26, 2009

Static gzip is your best friend

After we touched several aspects of caching let's return to gzip and review very simple and powerful technique 'Static gzip'.

What is Static Gzip?

Static Gzip is a way to serve compressed content w/o its actual compression 'on fly' (here is a blog post about gzip). Very roughly we have gzipped files and serve them instead of common ones. How can we do this?

General algorithm

  • First of all we need to have gzipped versions of initial files. Usually they are named with a postfix .gz, i.e. main.css.gz. As far as these files are static we can have them compressed at maximum. With Linux you can do the following
    gzip -c -n -9 main.css > main.css.gz
    to get the smallest compressed file from the initial one.
  • Secondly we must have any way to route HTTP requests to take compressed version of the file. Via Apache and mod_rewrite we can add the following rules to .htaccess file or Apache configuration.
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{HTTP_USER_AGENT} !Konqueror
    RewriteCond %{REQUEST_FILENAME}.gz -f
    RewriteRule ^(.*)\.css$ $1.css.gz [QSA,L]
    <FilesMatch \.css\.gz$>
    ForceType text/css
    <IfModule mod_mime.c>
    AddEncoding gzip .gz

Rip off the veil of the mystery

What does this set of rules actually do?

  • First of all we enable RewriteEngine (it may have been already enabled in your configuration).
  • Then
    RewriteCond %{HTTP:Accept-encoding} gzip
    rule selects all HTTP requests with
    Accept-Encoding: gzip
    header and allows Apache to perform other rules.
  • Then we skip Konqueror browser
    RewriteCond %{HTTP_USER_AGENT} !Konqueror
    because it seems it doesn't understand compressed content for CSS / JavaScript files.
  • Then we check if there is physical file with postfix .gz
    RewriteCond %{REQUEST_FILENAME}.gz -f
  • And only after all these checks we perform actual internal rewrite rule.
    RewriteRule ^(.*)\.css$ $1.css.gz [QSA,L]
    We redirect all requests to .css files to .css.gz files.
  • After this redirect (which is the last one — modificator [L]) we force Content-Type of such files to be text/css. Most servers send .gz with default archive encoding, and browsers can't properly detect this content.
  • Finally with AddEncoding gzip .gz rule sets proper Content-Encoding header for compressed content.

Of course this logic can be applied not only to CSS files but to all files which can be efficiently compressed — JavaScript, fonts, favicon.ico, etc.

That's all?

Graceful Degradation

What if Apache doesn't have mod_rewrite module? Or we need to serve CSS content via PHP? Or there is environment which doesn't support such logic (CGI / IIS)?

Well we can perform the whole algorithm actually via PHP too. All what we need here is the following.

  • Form compressed file name (usually just with a postfix .gz). And check for this file's existence.
  • If no such file exists, create compressed version of the file (i.e. via gzcompress function).
  • Then write this compressed content to a new file (with already defined 'gzipped' file name).
  • And set for this file time of change (mtime) to the value equals to initial (non-compressed) one. Why it's required? Because we can with check for existence also check if gzipped version of the current file has the same mtime and skip its re-creation in case of equivalence. If the current file and its compressed version have different time of change — it seems we need to re-create latter.

So with this logic we just check for file's existence and perform 2 checks for mtime (all such checks can be cached on file system level, or cache folder can be mapped to shared memory) and serve gzipped version. CPU is saved (with 80-85% of transferred content size)!

So Web Optimizer has all such approaches integrated and with version 0.6.7+ allows you to create .gz versions of CSS / JS files within any folder on your website.

Nov 20, 2009

Web Optimizer on Facebook

Yeah, now you can join us on Facebook:

Also there is a Twitter account

Nov 19, 2009

Several layers of caching

The last two topics were about various aspects of client side caching. Also there was a topic about cache integrity check. For now let's review possible overall caching schema:

  1. Web Optimizer can get optimized HTML code from its own cache. And output it to the browser. It's the first layer of caching — on the server side.
  2. Otherwise Web Optimizer gets raw results from CMS engine (HTML code). But they can be already cached (in CMS). Web Optimizer doesn't touch internal CMS caching logic, only uses it if it's available. So here can be the second layer of caching logic — also on the server side.
  3. During client side optimization performance Web Optimizer usually checks (view a complete description of this logic) if there are any files ready to be served (merged and combined ones). If yes -- all is OK here, Web Optimizer uses them. It's a one more server side caching layer here.
  4. Also on serving files Web Optimizer (but usually Apache web server) checks if there are gzipped versions of files — .gz ones — and uses them (via mod_rewrite or static gzip) rather than 'gzip on fly'. The fourth level of server side caching. And overall website configuration can have 1-2 more levels (i.e. on frontend proxy, nginx or squid, or on shared memory virtual disk).
  5. But before files are served browser receives ready HTML code with assets' URLs. And tries to fetch them in its own cache — browser's one. If yes (Web Optimizer sets strong caching headers) - such files aren't requested from server. It's a client side caching layer.
  6. If there are no such files in local cache browser requests files but local (or not very local but intermediate) proxy server can have them cached (read more about caching on proxies). So proxy server gives such files faster than initial website (and request doesn't reach the website at all). Actually client side caching layer.
  7. If request reaches initial server but has conditional caching headers (ETag or Last-Modified) server can response with 304-answer (and don't send content). So it's client side caching layer too — all content is taken from client side.

Wow! It seems that there is all. As you can see there is about 7-9 different caching layers somehow smudged between end client and end server (various chain links of requests' way from your browser to the website and back to you).

If the next posts we will try to light some of these layers in details.

Nov 16, 2009

Web Optimizer 0.6.6 released

A lot of new features and bug fixes in new version.

  • Added separation for Community / Lite / Premium Editions. Now free version of Web Optimizer becomes Community Edition and it's prohibited to be used on commercial websites. For this purpose you can also buy Web Optimizer Lite Edition (data:URI + performance included, $19.99). Version comparison.
  • Added option to move all scripts (w/o merging) to </body>. If you choose to move all scripts to </body> but Minify JavaScript option is disabled — all scripts will be just moved one-by-one to the end of the document. Very useful.
  • Added option 'Uniform cache files for all browsers'. In some cases (i.e. if you use different from Web Optimizer caching engine) it's incorrect to differ HTML code through browsers (yes, this disables data:URI group of options but allows you to cache any page for any browser only once).
  • Added option 'Cache external files'. Now light PHP proxy can be applied to external files too. They can be downloaded, gzipped, and cached — to improve your own website load speed.
  • Added option 'Enable chained optimization'. There are a few issues with chained optimization algorithm (due to buggy server side environments or mix of rights). So it can be disabled to prevent any Web Optimizer tries to pre-optimize your pages. With this option disabled all website pages will be optimized by first visit only.
  • Added events onBeforeOptimization, onAfterOptimization, onCache to plugins API. These events can be used for standalone version to add any dynamic code to PHP, apply any internal logic, or add any dynamic pieces to cached HTML documents.
  • Improved behavior (content skipping). For content different from (X)HTML and for Ajax requests it's very tricky to append and optimization techniques (in most of cases such content has been already optimized, so it's just skipped).
  • Improved general logic in case of </body> absence. Sometimes HTML document has no </body> tag (yes, that's true). So we need to apply all logic in such cases too.
  • Improved multiple hosts behavior (especially for dynamic images).
  • Improved unobtrusive logic (minor fixes, added counters).
  • Improved Apache modules detection on CGI environments (especially mod_rewrite).
  • Fixed several tiny bugs in files fetching (after dozens of unit tests integrated).
  • VaM Shop and Gekklog added to supported systems.

Download the latest Web Optimizer.

Nov 14, 2009

Client Side Caching: proxy servers and forced reload

In the previous blog post we have talked about caching basics. Let's review now how proxies actually work and how can we force cache reload.

Cache reload

The main issue with far future expires headers is that browser doesn't re-request resource but takes it from local cache. So if you have made any changes on your website they won't be visible for all 'old' users (with cached styles and scripts, HTML documents usually aren't cached so aggressively).

So what can we do with this trouble? How can we tell browsers to re-request such resources?

Main cache reload patterns

There are two main patterns to force browsers (user agents) to request the current asset once more.

  • Add to the file name any GET parameter (which should indicate new state of this asset). For example
    styles.css -> styles.css "physical" file name. For example
    styles.css -> styles.v20091114.css

Both approaches change URL of the asset and force browser to re-request it.

Cache reload and proxy servers

As you can see the first approach is simpler than the second. But there a few possible issues with it. First of all some proxy servers doesn't cache URL with GET parameter (i.e. our styles.css So if you have a lot of visitors from a network behind one firewall we will serve this asset to each visitor separately, without its caching of a proxy server. This will slow down overall website speed and sometimes this can be critical.

But how can we apply new file name without actual changes on file system? Is there any way to perform this with only change in HTML code? Yes!

Apache rewrite rules

Apache web server has a powerful tool to perform 'hidden' redirects for local file (this is called 'internal redirects'). We can manage the first way with just one predefined rule for all files (in our case it's a set of numbers after .v):

RewriteEngine On
RewriteRule ^(.*)\.v[0-9]+\.css$ $1.css

So all such files will be redirected to their physical equivalents but you can change a part of URL with .v at any time — and browsers will request this asset once more.

Automated cache reload

There are several ways to automate cache reload process for all changed files. As far as Web Optimizer combines all resources into 1 file, it's required to re-check file mtime (time of change) for all files and re-combine all resources.

Issues with re-checking all combined files have been already described last month, so it's not generally good to check them all with every web page visit. We can cache all previous checks into 1 file and check only its mtime. So it's done by default. By default we can check time of change of the only file (CSS or JS one) and add as a GET parameter or as a part of file name.

So this is applied for all such files (that should be cached on a client side) and results in the following:

/cache/website.css you can see there are two timestamps in these CSS files, one goes as a GET parameter, the other — as a part of URL (and with Apache mod_rewrite rule is transformed to /cache/website.css).

Overall schema

So what is overall caching algorithm for the website?

  1. Check if we have combined file. If no — create it.
  2. Check mtime of the combined file. If it's required add mtime to URL (using one of the described ways).
  3. Browser receives HTML code with the URL of combined file.
  4. Browser checks if it has this URL cached. If yes, all finished here.
  5. If not browser requests cached file (which is already prepared on the server or is cached on the proxy).

Nov 8, 2009

Client Side Caching: basics, automation, challenges

It was a long way to setup correct schema to handle all issues with client side caching. Let's review it step by step.

Cache Basics

It's very simple to cache a single file - we need just to add Cache-Control header (for HTTP/1.1) and Expires header (for HTTP/1.0). Headers look like this:

Expires: Thu, 31 Dec 2019 23:55:55 GMT
Cache-Control: max-age=315360000

Is it so straightforward? Well, almost.

Caching on proxy

Internet consists of different servers. Some of them are common websites. Some of them are transport nodes which just redirect traffic from user to website and vice versa. Providers are interested in traffic decrease. So they turn on caching on their servers to serve common requests from inner network, not to transfer them outside.

How can we affect this caching? In RFC 2616 there is defined a way to do this: Cache-Control: public. So our example will turn to:

Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000, public

Caching automation

But how all this can be applied to all files on the server? Apache has a special module to handle such situations: mod_expires. We can turn it on in Apache configuration with these directives:

ExpiresActive On
ExpiresDefault "access plus 10 years"

This seems to be very easy. What is the trouble?

What must be cached?

As far as almost all resources on a website are static (except maybe HTML documents) we can cache them all. The first approach is to cache all but skip caching for HTML documents. It can be performed this way (in Apache configuration):

ExpiresActive On
ExpiresDefault "access plus 10 years"
<FilesMatch \.(html|xhtml|xml|shtml|phtml|php)$>
ExpiresActive Off

So we cache all except HTML documents. But what if we have dynamic CSS / JS files or images? I.e. thumbnails are generated via PHP, and a number of styles are merged 'on fly'.

ExpiresByType is a magic wand!

We can use another directive to cache all required files by their MIME types (which are usually properly defined on server). This way:

ExpiresActive On
ExpiresByType text/css A315360000

It's generally better because in case of dynamic images with can apply cache headers by their type, not their extension. Well, that's all?

Different challenges

There can be a lot of more or less common issues with this approach:

  • There are not common MIME types for a number of resources (i.e. for font files). So we can apply cache headers only by their extensions with FilesMatch. Because we can't be sure for all server environments it's better to combine both approaches: set headers with FilesMatch and with ExpiresByType.
  • How to define what files must be cached? Web Optimizer has a great number of pre-defined MIME types and extensions to handle most of static resources on the server. Also it has several options to manage caching behavior.
  • If there is no mod_expires on the server we can emulate all caching behavior with a light PHP proxy. In case if there is mod_rewrite we can just redirect all required resources internally to this proxy and get files cached.
  • If there is no both mod_expires and mod_rewrite on the server we need to parse HTML code to replace all calls to static files with their equivalents via PHP proxy. That's all.

Web Optimizer has not only outstanding support for various cache techniques for different environments but also provides a number of ways to force cache reload (if we have resource changed). We will discuss this in the next posts.

Nov 4, 2009

Web Optimizer 0.6.5 Swift released!

We have finally tested and improved (a lot of minor stuff) the last stable product version - 0.6.5 aka swift - before 1.0 release. List of changes:

  • Added CSS/JS static file names setting. This can be useful for small websites with the only CSS/JS set for all pages - so it can be named (instead of hash usage).
  • Improved chained optimization behavior. After a lot of additional tests chained optimization (on settings' save and product activation) has been significantly improved. Now it wastes even less server side resources. For WordPress and Drupal plugins was added ability not only to completely refresh cache on options' save, but also just to save options. In some cases the latter won't lead to cache renewal.
  • Improved cache clean up. Fixed a number of minor issues with cache directories usage (generally for standalone application, this doesn't affect plugins / modules as they have cache directories hard coded).
  • Improved static assets proxy for compressed files. Added a number of performance improvements (logic simplified) and fixed wrong behavior on static gzip cache usage.

Download the latest Web Optimizer.