Strategy: Front S3 with a Caching Proxy

Given S3's recent failure (Cloud Status tells the tale) Kevin Burton makes the excellent suggestion of fronting S3 with a caching proxy server.
A caching proxy server can reply to service requests without contacting the specified server, by retrieving content saved from a previous request, made by the same client or even other clients. This is called caching. Caching proxies keep local copies of frequently requested resources. In normal operation when an asset (a user's avatar, for example) is requested the cache is tried first. If the asset is found in the cache then it's returned. If the asset is not in the cache it's retrieved from S3 (or wherever) and cached. So when S3 goes down it's likely you can ride out the down time by serving assets out of the cache.
This strategy only works when using S3 as a CDN. If you are using S3 for its "real" purpose, as a storage service, then a caching proxy can't help you...
Amazon doesn't used S3 as a CDN either Amazon Not Building Out AWS To Compete With CDNs. They use Limelight Networks.
Some proxy options are: Squid, Nginx, Varnish.
Planaroo shares how a small startup responds to an S3 outage (summarized):
Reader Comments (5)
make GETs go through the caching proxy. The POST/PUT/DELETE requests may have to fail for a while, but that's not 100% avoidable. Most all existing content continues to return. random requests for really old stuff may return 404, but hey, they're random and old!
I was looking at a similar solution offered by simpleCDN.com today. From their site:
SimpleCDN's PPU-MirrorCDN service automatically mirrors and caches your content from any HTTP accessible file store, making your content instantly available via SimpleCDN's Content Delivery Network.
MirrorCDN is not a storage or archival service, rather it acts as a global HTTP Accelerator. Think of MirrorCDN as being similar to Varnish or Squid Cache, except with many more features, and distributed across the world.
MirrorCDN is a pay per use service, with usage billed per GB of data transferred to end users. There are no additional storage or per-request charges.
"If you are using S3 for its "real" purpose, as a storage service, then a caching proxy can't help you."
Todd: I'm not sure how you arrive at this. Unless the working set of objects being requested is either prohibitively large, or very dynamic, how wouldn't a caching proxy help? With S3 charging bits coming in and out, I can't see how a reverse proxying could ever hurt. Am I misunderstanding something about your comment?
> I'm not sure how you arrive at this
I took Highway 280 which is lovely, but is sometimes quite backed up :-) My hopefully cogent point was that if you are using S3 to store files than having your own cache won't help because the write will not succeed. If you are using S3 as CDN then it is primarily for reading which means having your own cache will indeed help, assuming the cache has had enough time to fill with your more commonly used and important items. If assets beyond this working set are used you are indeed still screwed.
Hi there,
Their "MirrorCDN" service seems to serve requests from both Los Angeles and Europe via a server in Chicago. Not much "distribution network" in that. Avoid avoid ...