Entries by HighScalability Team (1576)

Monday
Oct292012

Gone Fishin' Two

Well, not exactly Fishin', I'll be on vacation starting today and I'll be back late November. I won't be posting anything new, so we'll all have a break. Disappointing, I know, but fear not, I will be posting some oldies for your re-enjoyment.

And If you've ever wanted to write an article for HighScalability, this would be a great time :-) I especially need help on writing Stuff the Internet Says on Scalability as I will be reading the Interwebs on a much reduced schedule. Shock! Horror! So if the spirit moves you, please write something.

My connectivity in Italy will probably be good, so I will check in and approve articles on a regular basis. Ciao...

Friday
Oct262012

Stuff The Internet Says On Scalability For October 26, 2012

It's HighScalability Time:

  • 1.5 Billion Pageviews: Etsy in September; 200 dedicated database servers: Tumblr
  • Quotable Quotes:
    • @rbransonDatadog stays available where it counts (metrics injest) by using Cassandra, combined with an RDBMS for queries. Nice. 
    • @jmhodges : Few engineers know what modern hw is capable of, in part, because the only people that see the numbers are in orgs that had to care or die.
    • @tinagroves : Storing the brain in the cloud might cost $38/month asserts Jim Adelius at #strataconf in talk on #bigdata and thought crimes.

  • Why is it hard to scale a database, in layman’s terms? on Quora. Some really good answers. My answer would involve a cookie jar filled with all different kinds of cookies and a motley crew of kindergartners all trying to get cookies at the same time while keebler elves are trying to fill up the jar, all at the same time.

  • Rackspace now has their own cloud block storage product, competing with Amazon's EBS. It looks like they are positioning their offering against both Amazon and GCE with a simple pricing model and both fast and consistent performance. It may be a bit more expensive. A possible plus for the future of Open is that they are using OpenStack Cinder APIs. Given that we are still talking about shared block storage system, it will be interesting to see if we see some of the same large scale failures that we have seen with EBS. Plan accordingly.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Oct252012

Not All Regions are Created Equal - South America Es Bueno

Rodrigo Campos shared some interesting benchmark results for AWS instances in the South America Region (São Paulo) and US East Region (North Virginia). He summarized the results in this thread on the Guerrilla Capacity Planning list:

Click to read more ...

Wednesday
Oct242012

Saving Cash Using Less Cache - 90% Savings in the Caching Tier

In a paper delivered at HotCloud '12 by a group from CMU and Intel Labs, Saving Cash by Using Less Cache (slides,  pdf), they show it may be possible to use less DRAM under low load conditions to save on operational costs. There are some issues with this idea, but in a give me more cache era, it could be an interesting source of cost savings for your product. 

Caching is used to:

Click to read more ...

Monday
Oct222012

Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

A lot of people seem to passionately dislike the term NewSQL, or pretty much any newly coined term for that matter, but after watching Alex Lloyd, Senior Staff Software Engineer Google, give a great talk on Building Spanner, that’s the term that fits Spanner best.

Spanner wraps the SQL + transaction model of OldSQL around the reworked bones of a globally distributed NoSQL system. That seems NewSQL to me.

As Spanner is a not so distant cousin of BigTable, the NoSQL component should be no surprise. Spanner is charged with spanning millions of machines inside any number of geographically distributed datacenters. What is surprising is how OldSQL has been embraced. In an earlier 2011 talk given by Alex at the HotStorage conference, the reason for embracing OldSQL was the desire to make it easier and faster for programmers to build applications. The main ideas will seem quite familiar:

  • There’s a false dichotomy between little complicated databases and huge, scalable, simple ones. We can have features and scale them too.
  • Complexity is conserved, it goes somewhere, so if it’s not in the database it's pushed to developers.
  • Push complexity down the stack so developers can concentrate on building features, not databases, not infrastructure.
  • Keys for creating a fast-moving app team: ACID transactions; global Serializability; code a 1-step transaction, not 10-step workflows; write queries instead of code loops; joins; no user defined conflict resolution functions; standardized sync; pay as you go, get what you pay for predictable performance.

Spanner did not start out with the goal of becoming a NewSQL star. Spanner started as a BigTable clone, with a distributed file system metaphor. Then Spanner evolved into a global ProtocolBuf container. Eventually Spanner was pushed by internal Google customers to become more relational and application programmer friendly.

Apparently the use of Dremel inside Google had shown developers it was possible to have OLAP with SQL at scale and they wanted that same ease of use and time to market for their OLTP apps. It seems Google has a lot of applications to get out the door and programmers didn’t like dealing with real-world complexities of producing reliable products on top of an eventually consistent system. 

The trick was in figuring out how to make SQL work at truly huge scales. As an indicator of how deep we are still in the empirical phase of programming, that process has taken even Google over five years of development effort. Alex said the real work has actually been in building a complex reliable distributed systems. That’s the hard part to get correct.  

With all the talk about atomic clocks, etc., you might get the impression that there’s magic in the system. That you can make huge cross table, cross datacenter transactions on millions of records with no penalty. That is not true. Spanner is an OLTP system. It uses a two phase commit, so long and large updates will still lock and block, programmers are still on the hook to get in and get out. The idea is these restrictions are worth the programmer productivity and any bottlenecks that do arise can be dealt with on case by case basis. From the talk I get the feeling over time, specialized application domains like pub-sub, will be brought within Spanner's domain. While the transaction side may be conventional, except for all the global repartitioning magic happening transparently under the covers, their timestamp approach to transactions does have a lot of cool capabilities on the read path.

As an illustration of the difficulties of scaling to a large number of replicas per Paxos group, Alex turned to a hydrology metaphor:

You could use a Spanner partition as a strongly ordered pub-sub scheme where you have read-only replicas all over the place of some partition and you are trying to use it to distribute some data in an ordered way to a lot of different datacenters. This creates different challenges. What if you are out of bandwidth to some subset of those datacenters? You don’t want data buffered in the leader too long. If you spill it to disk you don’t want to incur the seek penalty when bandwidth becomes available. It becomes like hydrology. You have all this data going to different places at different times and you want to keep all the flows moving smoothly under changing conditions. Smoothly means fewer server restarts, means better latency tail, means better programming model.

This was perhaps my favorite part of the talk. I just love the image of data flowing like water drops through millions of machines and networks, temporarily pooling in caverns of memory and disk, always splitting, always recombining, always in flux, always making progress, part of a great always flowing data cycle that never loses a drop. Just wonderful.

If you have an opportunity to watch the video I highly recommend that you do, it is really good, there’s very little fluff at all. The section on the use of clocks in distributed transactions is particularly well done. But, in case you are short on time, here’s a gloss of the talk:

Click to read more ...

Friday
Oct192012

Stuff The Internet Says On Scalability For October 19, 2012

It's HighScalability Time:

  • @davilagrau: Youtube, GitHub,..., Are cloud services facing a entropic limit to scalability?
  • Async all the way down? The Tyranny of the Clock: The cost of logic and memory dominated Turing's thinking, but today, communication rather than logic should dominate our thinking. Clock-free design uses less than half, about 40%, as much energy per addition as its clocked counterpart. We can regain the efficiency of local decision making by revolting against the pervasive beat of an external clock. 
  • Why Google Compute Engine for OpenStack. Smart move. Having OpenStack work inside a super charged cloud, in private clouds, and as a bridge between the two ought to be quite attractive to developers looking for some sort of ally for independence. All it will take are a few victories to cement new alliances.
  • 3 Lessons That Startups Can Learn From Facebook’s Failed Credits Experiment. I thought this was a great idea too. So what happened? FACEBOOK DID NOT ENCOURAGE SHARING — IF CONSUMERS DON’T HAVE A REASON TO SHARE, THEY WON’T; FACEBOOK NEVER MADE A CASE FOR CARING ABOUT CREDITS; FACEBOOK DISCOURAGED ITS PARTNERS (DEVELOPERS) FROM SUPPORTING CREDITS. 
  • Some patterns for fast Python by Guido van Rossum: Avoid overengineering datastructures; Built-in datatypes are your friends; Be suspicious of function/method calls; Don't write Java (or C++, or Javascript, ...) in Python; Are you sure it's too slow? Profile before optimizing!; The universal speed-up is rewriting small bits of code in C. Do this only when all else fails. Great discussion in the comments. 
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Oct182012

Save up to 30% by Selecting Better Performing Amazon Instances

If you like the idea of exploiting market inconsistencies to lower your costs then you will love this paper and video from the Hot Cloud '12 conference: Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2.

The conclusion is interesting and is a source of good guidance:

  • Amazon EC2 uses diversified hardware to host the same type of instance.  
  • The hardware diversity results in performance variation.
  • In general, the variation between the fast instances and slow  instances can reach 40%. In some applications, the variation can even approach up to 60%.  
  • By selecting fast instances within the same instance type,  Amazon EC2 users can acquire up to 30% of cost saving, if the fast instances have a relatively low probability.

The abstract:

Click to read more ...

Wednesday
Oct172012

World of Warcraft's Lead designer Rob Pardo on the Role of the Cloud in Games

In a really far ranging and insightful interview by Steve Peterson: Game Industry Legends: Rob Pardo, where the future of gaming is discussed, there was a section on how the cloud might be used in games. I know there are a lot of game developers in my audience, so I thought it might be useful:

Q. If the game is free-to-play but I have to download 10 gigabytes to try it out, that can keep me from trying it. That's part of what cloud gaming is trying to overcome; do you think cloud gaming is going to make some inroads because of those technical issues?

Click to read more ...

Tuesday
Oct162012

Sponsored Post: Server Stack, Akiban, Wiredrive, NY Times, CouchConf, FiftyThree, Percona, ElasticHosts, ScaleOut, New Relic, NetDNA, GigaSpaces, AiCache, Logic Monitor, AppDynamics, CloudSigma

Who's Hiring?

  • Wiredrive is looking for a SENIOR WEB APPLICATION SYSTEMS ADMINISTRATOR and a TEST AUTOMATION ENGINEER to join our agile infrustructure team. For full job descriptions please see http://wdrv.it/QA6iTw
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • FiftyThree, the company behind the award-winning iPad app Paper, is looking for a {Backend || DevOps} Engineer to help us build our next great product: a service to "bring ideas together". http://www.fiftythree.com/jobs
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

  • Integrating location-based capabilities doesn't have to be difficult or expensive. Join us for "How to create Geospatial Indexes for Nearest Neighbor and Geofencing queries in Akiban.Register Here
  • CouchConf is a one-day, three track event is for any developer who wants to take a 
  • deeper dive into Couchbase NoSQL technology, 
  • learn where it’s headed and build really cool stuff.
  • Percona announces MySQL training for busy professional: Developer Training for MySQL. Percona is offering savings of over 35% for this course in the month of August.

Cool Products and Services

  • ServerStack offers the industry's most scalable managed servers for high traffic/bandwidth websites backed by unlimited 24/7 network, server and application support.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free.  No sign-up required. http://aicache.com/deploy
  • ElasticHosts launches white-label cloud reseller program offering 30% revenue share on fully rebranded cloud hosting.
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • Follow the Cloudify blog to learn more about our open source PaaS stack – latest integration recipes, builds, features, and other cool stuff.  Visit the GigaSpaces blog to learn how to take your application to the next level of scalability and performance.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
Oct152012

Simpler, Cheaper, Faster: Playtomic's Move from .NET to Node and Heroku

This is a guest post by Ben Lowry, CEO of Playtomic. Playtomic is a game analytics service implemented in about 8000 mobile, web and downloadable games played by approximately 20 million people daily.

Here's a good summary quote by Ben Lowry on Hacker News:

Just over 20,000,000 people hit my API yesterday 700,749,252 times, playing the ~8,000 games my analytics platform is integrated in for a bit under 600 years in total play time. That's just yesterday. There are lots of different bottlenecks waiting for people operating at scale. Heroku and NodeJS, for my use case, eventually alleviated a whole bunch of them very cheaply.

Playtomic began with an almost exclusively Microsoft.NET and Windows architecture which held up for 3 years before being replaced with a complete rewrite using NodeJS.  During its lifetime the entire platform grew from shared space on a single server to a full dedicated, then spread to second dedicated, then the API server was offloaded to a VPS provider and 4 – 6 fairly large VPSs.   Eventually the API server settled on 8 dedicated servers at Hivelocity, each a quad core with hyperthreading + 8gb of ram + dual 500gb disks running 3 or 4 instances of the API stack.
 
These servers routinely serviced 30,000 to 60,000 concurrent game players and received up to 1500 requests per second, with load balancing done via DNS round robin.

In July the entire fleet of servers was replaced with a NodeJS rewrite hosted at Heroku for a significant saving.

Scaling Playtomic with NodeJS

Click to read more ...