Well, last few days we were working on improving the very core logic of Web Optimizer algorithms - fetching files and checking cache integrity. Why is it so important? After all cache files are created (all merged JavaScript, CSS, all CSS Sprites and data:URI resources, gzipped versions, etc) website should be loaded as fast as possible. It's not generally very good if client side optimization wastes server time out.
And right now we reached 10x better performance for full version (with unobtrusive logic and multiple hosts disabled - up to 20x better). How is it possible?
General flow
Here is pie chart for time consumption for Web Optimizer logic:
This chart is valid for both versions - demo and full - but it can be optimized for your website only with full version. Demo version doesn't have performance group of options.
Cache integrity
Why cache integrity is so important? Because we need to be sure that all merged and minified files are up to date. It will be very bad if we can create cache files only once. And every small change in website design or layout would lead to all website cache re-creation. Web Optimizer can check cache integrity 'on fly', and perfectly does this.
But there is a huge lack of performance: with every hit to your website Web Optimizer checks all files that are listed in HTML documents and re-calculates the check sum. Then checks if such cache files exist. And only then serves the optimized content. It's very good, but it's excess. Usually websites are not changed for monthes and years. So we don't need to check these files thousand times a day.
The first point: do not check files changes
We can skip re-calculation of files' content and this can bring us about 2-3 times acceleration due to elimination of very expensive file system calls. Well this leads to cache clean up every time when physical files are changed. But on the live website it's not often, but saves 50-70% of server time (on Web Optimizer actions).
For this logic option "Ignore file modification time stamp"
in Performance section is resposible. The difference between this option and fourth option below (which just skips file system calls) is the following.
If you change file content (i.e. add a few styles to main.css
) with this option disabled (enabled check of file modification time) Web Optimizer will fetch content of main.css
and tries to compare it with the previous one (by check sum). If check sum is different - a new cache file will be created. With this option enabled (disabled check of file modification time) Web Optimizer will take into account only file name (usually all content in head
section), but not its content.
The second point: exclude regular expressions
Further investigation what was slow in Web Optimizer core logic put light on a lot of Regular Expressions. Well, RegExp's are very good if you need to do something fast and be sure in result. But they are also very expensive. And in the most of cases they can be replaced with raw string functions (i.e. strpos
, substr
, etc). Well-formed standard-compaint websites can be parsed very quickly, so why Web Optimizer must be slow for them? It must be slow for old-schooled websites, that can be parsed in a standard way.
This logic is managed by "Do not use regular expressions"
option. This approach saves about 15-20% everywhere in Web Optimizer core.
The third point: quick check
So we have reduced calls to file system (from dozens to 2-3), we have optimized regular expressions with string functions, what else? The next step should be in reducing overall logic operations. While fetching all styles and scripts we make a lot operations: get tags, get attribute, correct attributes, check options, check values, etc.
All this can be skipped on just general cache integrity check. So reducing this logic to minimum (just to be sure that we can serve the same cached files for the same pack of styles and scripts) can bring us additional 10% in performance. A few but with other approaches is enough to provide the fastest client side optimization solution.
This is Check cache integrity only with head
option.
The fourth point: reduce even more
OK, but the resonable question will arise: why will we need to perform any calls to file system? The answer is simple: we need to force cache reload on a client side if we have the same cache file name (i.e the same set of scripts but their content has been changed, so we need to reload it on the client side). Cache reload is forced by additional GET parameter (in demo version) and changed file name (with mod_rewrite
in full version). These operations (check for cache file existence and its mtime
) can be avoided if we hard code 'version' of our website application. So calls to file system can be reduced to 0.
But this is generally dangerous: we can't check cache integrity properly, and can serve files which don;t exist. This can be made only after all cache files have been created, and we are just tunind server side to the best performance.
This is option Cache version number
(zero value skips its usage).
Finally
For now we have the following picture (in relative numbers):
As you can see almost all parts are balanced to achieve exceptional server side performance (usually 1-2-5 ms in full version versus 20-40-100 for a demo version). All options are included in nightly builds and will be available in 0.6.3 after complete testing.