Saturday
Aug042007
Try Squid as a Reverse Proxy

This scalability strategy is brought to you by Erik Osterman:
My recommendations for anyone dealing with explosive growth on a limited budget with lots of cachable content (e.g. content capable of returning valid expiration headers) is employ a reverse proxy as mentioned in this article.
In the last week, we had a site get AP'd, triggering 100K unique visitors to a single IIS server in under 5 hours. It took out the IIS server. Placing a single squid infront of the server handled the entire onslaught with a max server load of 0.10 on a modest Intel IV 3Ghz.
It's trivial to implement for anyone interested...
Reader Comments (15)
I've been using squid for different things for about 10 years now. Though it works for most cases, I've been asked to be cautious before putting it in high traffic sites. The person who mentioned this had mentioned that they had to restart squid every few hours under very heavy load.
Yep, YouTube also mentioned some problems with squid.
From http://highscalability.com/youtube-architecture:
Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
So it's not all roses and candy.
I think that it's a kind of truism to say that as "load increased, performance eventually decreased." That's to be expected with most servers; once capacity is reached, performance often degrades exponentially as the server catches up processing all backlogged requests and still accepting new requests. I see this as an indicator it's time to scale out. I haven't done the math to calculate the cost effectiveness, but my gut instinct is that Squid is still one of the most cost effective ways to attack the problem as opposed to throwing more web servers into the pool. With its own built-in peer-to-peer caching network, Squid makes it far easier and more efficient to scale than web servers. This means that as you scale out the Squids, they can just request content from cache peers leaving the web servers free to handle new requests. Squids can handle enormous amounts of traffic well, but will get overwhelmed at a certain point; that's inevitable.
I am frankly surprised that YouTube had trouble serving it's thumbnail traffic from Squid. Google has been using Squid for it's Thumbnails on their image search, Orkut and likely on many other properties. To be fair, cache hydration is an issue when dealing with millions of small objects. I also wish Squid would use it's own object store that didn't involve storing each individual object as a file on disk.
As for Squid's stability under high-load, I am curious if it still suffers from problems like that. Squid is an actively maintained project, frequently releasing updates. It's come a long ways from v1. If someone is genuinely experiencing lockups under high-load, I really hope they get in touch with the maintainers of Squid so as to get to the root of the problem.
Every piece of the your architecture won't scale into infinity - Squid is no exception.
However, for sites that have very little content when compared to the number of HTTP requests and the content can be made cacheable the performance can be blazingly fast.
Squid can be configured to function with a memory cache btw. I think the disk cache is option but might be wrong.
Has anyone tested Squid against Varnish? I've read some articles, and watched a couple of presentations by the author, and it looks pretty promising. The author claims that Varnish will provide significantly better performance and scalability when compared with squid.
Worth reading this link to see why Varnish is better than Squid. It's all about virtual memory usage, and not fighting the operating system.
http://varnish.projects.linpro.no/wiki/ArchitectNotes
I actually found that squid performs better when compiled with Tcmalloc (google's perftool).
Squid used to degrade over time (several months of uptime) with average load of 3000 requests/minute on my webserver. After compiling it with perftool, I'm no longer seeing the degradation.
In fact perftool is so good that I decided to compile it with MySQL and my server runs happily now.
@Dan Kubb
> Has anyone tested Squid against Varnish? I've read some articles, and
From my testing, Varnish crushes Squid, it's not even close. I've worked with a few different load levels in a corp, mid-sized web environment - Varnish will come up 5-20x faster than Squid depending on load on identical hardware. I've worked on our Squid config for over a month and can't improve upon it, while I've hardly touched Varnish's. While I'm sure there's still a place for Squid, I would say it is more for a filter/forward proxy, and not a reverse proxy anymore.
fak3r
fak3r:
With your squid vs varnish comparisons, are you working with constantly full caches ? Meaning, is the working set larger than the cache size ? Varnish only recently had LRU eviction put in, so it can handle a consistently churning cache. Or, are you testing with what can be served just from memory ?
From my testing, Varnish crushes Squid, it's not even close. I've worked with a few different load levels in a corp, mid-sized web environment - Varnish will come up 5-20x faster than Squid depending on load on identical hardware. I've worked on our Squid config for over a month and can't improve upon it, while I've hardly touched Varnish's. While I'm sure there's still a place for Squid, I would say it is more for a filter/forward proxy, and not a reverse proxy anymore.
Oyun: were your tests done with working sets that were larger than could fit in memory, or on disk? i.e., was varnish constantly evicting objects at the same time ? or were you just serving everything out of memory each time? I've yet to hear of a varnish install that was running with constantly full caches, which is why I'm sticking with squid.
congrulations
YouTube uses cache-unfriendly hashing algorithms to generate their URLs, I wouldn't be surprised if they did the same for thumbnails.
Squid _does_ have a single-file-blob storage engine, especially suited for small objects. It's named COSS (Cyclic Object Storage System). Unfortunately it's only available in squid 2.7 now, squid 3.0 needs some stability fixes before it can safely use it. Contributions of any kind are welcome.
I wouldn't call the development rate of squid 'breakneck', but it _is_ an active project. Version 3.1 is coming Real Soon Now.
Developers actively follow the squid-users mailing-list. That's the most appropriate environment for seeking help on all things squid.
Thanks!
I am surprised that YouTube had trouble serving it's thumbnail traffic from Squid. Google has been using Squid for it's Thumbnails on their image search, Orkut and likely on many other properties.Cache hydration is an issue when dealing with millions of small objects. I also wish Squid would use it's own object store that didn't involve storing each individual object as a file on disk. can you help me out.
Thank you so much for your nice tutorial.
Recently I setup a Reverse Proxy Server with Squid (server accelerator) and wrote a full detailed tutorial that you can find in:
http://cosmolinux.no-ip.org/raconetlinux/html/17-squid.html
where I explain how to configure Squid (version 3.x) as a reverse Proxy Server (server accelerator), providing examples about how to do it using two
computers (one as a Proxy server and another as a Web Server) or just by using one single computer.
I also describe how to format the Squid's logs and how to send the logs to a remote computer.
Also, you can find an explanation of how to deny access to certain files and how to get correct logs in Apache Web Server.
I wish it is useful to someone.