Stuff The Internet Says On Scalability For December 5, 2011
Monday, December 5, 2011 at 9:30AM
HighScalability Team in hot links
It's HighScalability Time!
Quotable quotes:
@jaykreps : Was wondering, How can I turn my boring, cachable, read-only traffic into random writes on mongodb? And lo! link
@marshallk : Google runs 100-200 experiments every day on UI, algorithm & product
@styggiti : The problem with companies like IBM and Oracle baking NoSQL "scalability" into their products isn't the tech, it's the $$ licensing.
Blazing fast node.js: 10 performance tips from LinkedIn Mobile. You may have thought that node.js made just everything magically fast, but Shravya Garlapati has some great strategies for going even faster: Avoid synchronous code; Turn off socket pooling; Don't use Node.js for static assets; Render on the client-side; Use gzip; Go parallel; Go session-free; Use binary modules; Use standard V8 JavaScript instead of client-side libraries; Keep your code small and light.
Nice thread in NoSQL Databases on HBase and Consistency in CAP. The short summary of the article is that CAP isn't "C, A, or P, choose two," but rather "When P happens, choose A or C."
Fast, easy, realtime metrics using Redis bitmaps.Chandra Patni explains the innovative use of Redis bitmaps to handle problems like a daily unique user count for 128 million users in 50 ms. The advantages are speed and space efficiencies. I like the idea of keeping a seperate bitmap index for each account to facilitate keeping stats by bitmap.
BitTorrent’s µTP protocol routes around TCP's near fatal resend on congestion algorithm by using UDP to yield until the pipes are cleared up again when congestion is detected. Good article by Janko Roettgers: How BitTorrent wants to save the Internet. IETF Proposal.
Lars George has written a great book on HBase: HBase: The Definitive Guide. The Architecture chapter especially has an illuminating explanation of hardisk Seek vs. Transfer tradeoffs in database design and how it relates to B+Trees vs. Log-Structured Merge-Trees.
Scaling with the Kindle Fire. Good article how Pulse, a news reading app for iPhone, iPad, and Android, uses GAE to serve 100Ms of requests per day. Use memcache, set cache control headers, tune down instance creation, buy the Premier account, split load across different applications.
Cloud email service price comparison. Will comparing a bunch of email services. Sending bulk email is huge PITA so it's good to see a comparison. On Hacker News. So which email cloud provider should you use? Use the graphs I made, but price is only going to be one factor, so check what each provider offers. I’ve linked to all the pricing pages below.
Programming language impact on the development of distributed systems by Debasish Ghosh, Justin Sheehy, Kresten Krab Thorup, Steve Vinoski. In this paper, we first present a history of programming languages and distributed systems, and then explore several alternative languages along with modern systems built using them.
Netflix has released: Curator - The Netflix ZooKeeper Library. Looks like a great easy to use wrapper on top of ZooKeeper. The article lists the common problems with ZooKeeper and how they handle them. At Netflix ZooKeeper is used for: lock for sequence ID generators; Cassandra Backups; TrackID Service;leader selection; locking 3rd party services; caching.
Cassandra at Gowalla. Adam Keys with a generous discussion of their Cassandra usage. It’s become out database of choice for applications with relatively fixed query patterns that, for us to succeed, need to handle a rapidly growing dataset.
Stale cache serving strategy with reactive flush. John Clarke Mills jas achieved the goal of no cache misses by first building what I call a persistent cache; memcache backed by disk or database.
Nigel Poulton has Seen The Future of SSD Arrays! and it is going to look like an industry standard off the shelf x86 server, crammed full of industry standard form-factor hot-pluggable SSD drives, running SCSI over PCIe with all of the smarts and clevers in software.
If you are interested in Software Defined Networking there's a new SDN meetup you might be interested in.
deviantArt shows Faster Web Development with Virtual Machines. Every developer gets their own virtual machine, which means: Fewer Commits, Reduced Contention for the Staging Server, Freedom to Experiment. Also, Chaos Gerbils: An Explanation - a gripping story of recovering from a data corruption bug with statement-based replication used for unique IDs.
Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.