Using Varnish for Paywalls: Moving Logic to the Edge

This is a guest post from Per Buer, founder and CEO of Varnish Software, provider of Varnish Cache, an open source web application accelerator freely available at varnish-cache.org. Varnish powers a lot of really big websites worldwide.
We at Varnish Software are all about speed. Varnish Cache is built for speed. It executes its policy code more or less a thousand times faster than your typical Java or PHP based application servers, mostly due to the fact that the configuration is compiled into system call free machine code.
System calls require expensive context switches, stall the CPU and wreck havoc in the CPU cache so avoiding them makes the code fly. There are strong limitations on what kind of logic you can move into Varnish Cache, but the logic that you do move there will run very fast.
An example is using Varnish for access control to serve access controlled content from the caching edge layer.
The Varnish Paywall
Who gets to access your content? In a traditional environment the caching layer only serves up pieces of content without giving any thought to who gets access to it. Since the rules governing access control can be rather complex these rules have traditionally been implemented in the application server, which is slow.
We’ve seen companies struggle with performance as they suddenly have to revert to serving content from their application layer again. With a bit of effort and some open source magic you can have your lunch and eat it too: serve access controlled content from the caching edge layer.
How would it work?
Varnish Cache would need two pieces of information. One would be a header coming from the origin server indicating that this piece of information is under access control, maybe X-Access-Control. If the header is present Varnish would then check whether the user is logged in or not, using a cookie. This cookie would be set by an authentication service, and if you are worried about users cheating you could secure it by signing the cookie cryptographically. It’s possible, but not recommended to implement the actual authentication in Varnish itself using modules to access data in a database or another data source. As each user usually only logs in once this is is not a performance critical path and so it makes more sense to do it on your regular application servers.
Authenticated access is not the only option, you might also want to limit access to say 5 articles per user per week. If so, you would store the read article count in a signed cookie or a NoSQL database like Redis or MemcacheDB. To do this you extend Varnish through the use of a Varnish Module as VCL itself lacks flow control structures such as for loops.
What tools are needed? How much effort?
Getting a proof of concept up and running should be pretty fast. You could just check for the presence of a certain value in a cookie and you’ve proven the concept. The more advanced controls will require more effort, maybe a week or two of work, depending on the complexity.
Caution
Be wary when moving logic away from your application servers. You must always maintain clear guidelines on what goes where or you’ll quickly end up with a messy infrastructure. In some organization the edge cache is considered infrastructure and not part of the web application and handled by different teams.
Resources
- Digest VMOD, https://www.varnish-cache.org/vmod/digest
- Redis VMOD, https://www.varnish-cache.org/vmod/redis
- Writing VMODs, http://blog.zenika.com/index.php?post%2F2012%2F08%2F21%2FCreating-a-Varnish-module
- The Varnish Book, https://www.varnish-software.com/book
- The Official Varnish Documentation, https://www.varnish-cache.org/docs/
Reader Comments (6)
This is great. Setting up a authorization wall with Varnish can also be used if you want to use Varnish with a intranet where authorization is required. Is there any more specifc documentation on the subject? I have previous been looking around for it but it really lacks.
Pretty vague post. Not up to your usual standard.
TL;DR: Please try our caching software. We think it's quite fast.
I really think this article could have used more content. It reads more like a promotional booklet then a technical article.
Agreed, article is more of a teaser, and recommended resources are very broad and generic. I would have liked to see more depth on how to structure things, pitfalls to avoid, etc. Its not clear if the writer has actually set up a large paywall environment, or if he just thinks Varnish would be a great tool to use if he did. Would be nice to have some insights into problems he encountered prior to using Varnish.
Merely checking for the presence of a cookie isn't sufficient for a paywall; even if it is signed it can still be hijacked. You need some way to validate the value on each request. And if you do that on your "regular application servers" you are looking at zero caching for all logged in requests.
It would seem that a module to perform cookie validation is required (eg. via remote service call to an auth service).
@AW: Most paywalls have a relaxed attitude towards security and are not to worried about session highjacking. Case in point - the NY Times Paywall could be disabled by disabling Javascript. Of course one could tie the session cookie to an IP address which would secure it against highjacking - but in 90% of the cases the content isn't valuable enough in itself to do so.
@Lasse: No. There isn't really much specific documentation. The VCL language it self has OK docs but the "cookbook" is a bit on the thin side. I'm currently rewriting most of the docs and I'll see if there is nice place to put an example that goes more into detail.
I agree the post is somewhat vague. It does not contain code listing of the relevant VCL code but tries to outline how it could be solved. The code is seldom reused as there are enough differences between each implementation to make most of it specific to the particular implementation. There are parts that are generic such as validating cookies but I deem it to be trivial and there are examples out there if anybody does a search.
Anyway, it isn't black magic and if anybody sits down for a couple of hours they should be able to write something from scratch in a couple of hours.
As noted we should have listed pitfalls: Here a couple we've met so far:
1) VCL does not have loops. Unless of course you use loops in inline-C and wrap VCL code. I would advise against that as it would be butt ugly and hard to maintain. We usually end up building the stuff that require loops in a module to keep the config clean.
2) Date manipulation in VCL is non-exsistant. Again, this is something we chuck into a module. We're planning to release an open source module to assist with this at a later point.
3) Session timeouts should be handled in a graceful manner. Until now we've just redirected the client back to the SSO and let it reissue the cookie and redirect back. One could also move this to a client side script that would talk directly to the SSO and extend the lifetime of the session cookie.
I hope that answers most of your questions for now.