This is a guest post by Jeff Su from Factual.
Varnish is an open source, high performance http accelerator that sits in front of a web stack and caches pages. This caching layer is very configurable and can be used for both static and dynamic content.
One great thing about Varnish is that it can improve the performance of your website without requiring any code changes. If you haven’t heard of Varnish (or have heard of it, but haven’t used it), please read on. Adding Varnish to your stack can be completely noninvasive, but if you tweak your stack to play along with some of varnish’s more advanced features, you’ll be able to increase performance by orders of magnitude.
Some of the high profile companies using Varnish include: Twitter, Facebook, Heroku and LinkedIn.
One of Factual’s first high profile projects was Newsweek’s “America’s Best High Schools: The List”. After realizing that we had only a few weeks to increase our throughput by tenfold, we looked into a few options. We decided to go with Varnish because it was noninvasive, extremely fast and battlefield tested by other companies. The result yielded a system that performed 15 times faster and a successful launch that hit the front page of msn.com. Varnish now plays a major role in our stack and we’re looking to implement more performance tweaks designed with Varnish in mind.
The easiest and safest way to add Varnish to your stack is to serve and cache static content. Aside from using a CDN, Varnish is probably the next best thing that you can use for free. However, dynamic content is where you can squeeze real performance out of your stack if you know where and how to use it. This guide will only scratch the surface on how Varnish can drastically improve performance. Advanced features such as edge side includes and header manipulation allow you to leverage Varnish for even higher throughput. Hopefully, we’ll get to more of these advanced features in future blog posts, but for now, we’ll just give you an introduction.
Please follow the installation guide on Varnish’s documentation page. http://www.varnish-cache.org/docs
Assuming you’ve installed it correctly, you should be able to run both your webserver and Varnish on different ports. The rest of this guide will assume that you have your webserver running on port 8080, Varnish running on port 80.
Varnish uses its own domain specific language for configuration. Unlike a lot of other projects, Varnish’s configuration language is not declarative. Its very expressive and yet easy to follow. For ubuntu, Varnish’s config file is located here: /etc/varnish/default.vcl A lot of the examples we’ll dive into are based on Varnish’s own documentation here.
This is a simple Varnish config file that will cache all requests whose URI begins with “/sytlesheets”. There are a few things to note here that we’ll explain later:
# Defining your webserver. backend default { .host = "127.0.0.1"; .port = "8080"; } # Incoming request # can return pass or lookup (or pipe, but not used often) sub vcl_recv { # set default backend set req.backend = default; # remove unset req.http.Accept-Encoding; # lookup stylesheets in the cache if (req.url ~ "^/stylesheets") { return(lookup); } return(pass); } # called after recv and before fetch # allows for special hashing before cache is accessed sub vcl_hash { } # Before fetching from webserver # returns pass or deliver sub vcl_fetch { if (req.url ~ "^/stylesheets") { # removing cookie unset beresp.http.Set-Cookie; # Cache for 1 day set beresp.ttl = 1d; return(deliver); } } # called after fetch or lookup yields a hit sub vcl_deliver { } # sub vcl_error { }
The reason this is done is because Varnish doesn’t handle encodings (gzip, deflate, etc…). Instead, Varnish will defer to the webservers to do this. For now, we’re going to ignore this header and just have the webservers give us non-encoded content. The proper way to handle encodings is to have the encoding normalized, but we’ll discuss this later.
We do this because we don’t want the webserver giving us session-specific content. This is just a safe guard and is probably a little unnecessary, but its probably a good thing to note when caching. We’ll discuss session-specific content later.
Returning “pass” tells Varnish to not even try to do a cache lookup. Returning “lookup” tells Varnish to lookup the object from its cache in lue of fetching it from the webserver. If the object is cached, the webserver is never hit. If it isn’t in the cache, then vcl_fetch is called before fetching the content from the webserver.
Let’s say that we want to cache every users “/profile” page. This can be done by including the cookie in the hash function like this:
sub vcl_hash { if (req.url ~ "^/profile$") { set req.hash += req.http.cookie; } }
In Ruby on Rails, it is common practice to attach trailing timestamps at the end of static content to ensure that the web browser doesn’t cache it (e.g. /stylesheets/main.css?123232113). Let’s say we don’t want to include this when we cache our stylesheets. Here is an example that will remove the trailing timestamp.
sub vcl_hash { if (req.url ~ "^/stylesheets") { set req.url = regsub(req.url, "\?\d+", ""); } }
Caching browser specific content. One trick we use is to have a small portion of our css be browser specific to handle various differences between browsers. We do this by having a dynamic call that will serve up css based on the User-Agent header. The problem with this technique is that we’ll have different css being served by the same url. Varnish can still cache this by adding the User-Agent header to the hash like such:
sub vcl_hash { if (req.url ~ "^/stylesheets/browser_specific.css") { set req.hash += req.http.User-Agent } }
Varnish has options to create ACL’s to allow access to certain requests:
# create ACL acl admin { "localhost"; "192.168.2.20"; } sub vcl_recv { # protect admin urls from unauthorized ip's if (req.url ~ "^/admin") { if (client.ip ~ admin) { return(pass); } else { error 405 "Not allowed in admin area."; } } }
There are times when we need to purge certain cached objects without restarting the server. Varnish allows 2 ways to purge: lookup and url. These examples are based on the Varnish documentation page on purginge: http://www.varnish-cache.org/trac/wiki/VCLExamplePurging
Purging by lookup uses the vcl_hit function and “PURGE” http action:
acl purgeable { "localhost"; "192.168.2.20"; } sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purgeable) { set obj.ttl = 0s; error 405 "Not allowed to purge."; } } } sub vcl_hit { if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "Purged."; } } sub vcl_miss { if (req.request == "PURGE") { set obj.ttl = 0s; error 404 "Not in cache."; } }
Purging by url is probably a safer bet if you are using cookies or any other tricks in your hash function:
sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purgeable) { error 405 "Not allowed."; } purge("req.url == " req.url " && req.http.host == " req.http.host); error 200 "Purged."; } }
Its good to canonicalize your encoded requests because you could either get redundent cached objects, or you could end up returning incorrect encoded objects. For more details, please refer to the Varnish FAQ on Compression. Below is a snippet from that page.
if (req.http.Accept-Encoding) { if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "Internet Explorer") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } }
Lets pretend that we have a special assets server that serves up just our stylesheets. Here is an example of having multiple backends for this purpose:
backend default { .host = "127.0.0.1"; .port = "8080"; } backend stylesheets { .host = "10.0.0.10"; .port = "80"; } sub vcl_recv { if (req.url ~ "^/stylesheets") { # set stylesheets backend set req.backend = stylesheets; return(lookup); } # set default backend set req.backend = default; return(pass); }
backend server1 { .host = "10.0.0.10"; } backend server2{ .host = "10.0.0.11"; } director multi_servers1 round-robin { { .backend = server1; } { .backend = server2; } } director multi_servers2 random { { .backend = server1; } { .backend = server2; } }
When we first started using Varnish, it was out of desperation and all new to us. Over the past year, we’ve been figuring out ways to leverage its performance in more creative ways. At this point, we couldn’t imagine putting together a stack that didn’t include this great project.
We hope this post has been helpful for anyone interested in getting varnish setup for the first time.