Entries by HighScalability Team (1576)

Thursday
May102012

Paper: Paxos Made Moderately Complex

If you are a normal human being and find the Paxos protocol confusing, then this paper, Paxos Made Moderately Complex, is a great find. Robbert van Renesse from Cornell University has written a clear and well written paper with excellent explanations.

The Abstract:

For anybody who has ever tried to implement it, Paxos is by no means a simple protocol, even though it is based on relatively simple invariants. This paper provides imperative pseudo-code for the full Paxos (or Multi-Paxos) protocol without shying away from discussing various implementation details. The initial description avoids optimizations that complicate comprehension. Next we discuss liveness, and list various optimizations that make the protocol practical.

Click to read more ...

Wednesday
May092012

Cell Architectures

A consequence of Service Oriented Architectures is the burning need to provide services at scale. The architecture that has evolved to satisfy these requirements is a little known technique called the Cell Architecture.

A Cell Architecture is based on the idea that massive scale requires parallelization and parallelization requires components be isolated from each other. These islands of isolation are called cells. A cell is a self-contained installation that can satisfy all the operations for a shard. A shard is a subset of a much larger dataset, typically a range of users, for example. 

Cell Architectures have several advantages:

  • Cells provide a unit of parallelization that can be adjusted to any size as the user base grows.
  • Cell are added in an incremental fashion as more capacity is required.
  • Cells isolate failures. One cell failure does not impact other cells.
  • Cells provide isolation as the storage and application horsepower to process requests is independent of other cells.
  • Cells enable nice capabilities like the ability to test upgrades, implement rolling upgrades, and test different versions of software.
  • Cells can fail, be upgraded, and distributed across datacenters independent of other cells.

A number of startups make use of Cell Architectures:

  • Tumblr: Users are mapped into cells and many cells exist per data center. Each cell has an HBase cluster, service cluster, and Redis caching cluster. Users are homed to a cell and all cells consume all posts via firehose updates. Background tasks consume from the firehose to populate tables and process requests. Each cell stores a single copy of all posts.
  • Flickr: Uses a federated approach where all a user’s data is stored on a shard which is a cluster of different services.
  • Facebook: The Messages service has as the basic building block of their system a cluster of machines and services called a cell. A cell consists of ZooKeeper controllers, an application server cluster, and a metadata store.
  • Salesforce: Salesforce is architected in terms of pods. Pods are self-contained sets of functionality consisting of 50 nodes, Oracle RAC servers, and Java application servers. Each pod supports many thousands of customers. If a pod fails only the users on that pod are impacted.

The key to the cell is you are creating a scalable and robust MTBF friendly service. A service than can be used as a bedrock component in a system of other services coordinated by a programmable orchestration layer. It works just as well in a data center as in a cloud. If you are looking for a higher level organization pattern, the Cell Architecture is a solid choice.

Related Articles

Tuesday
May082012

Sponsored Post: Infragistics, Velocity, Reality Check Network, Gigaspaces, AiCache, ElasticHosts, Logic Monitor, Attribution Modeling, New Relic, AppDynamics, CloudSigma, ManageEnine, Site24x7

Who's Hiring? 

  • Are you looking for good people?

Fun and Informative Events

  • The DevOps PaaS Infusion Meetup NYC - Taking Mission-Critical Apps to the Cloud. You’ll hear real world use cases from Microsoft, Aditi, Cisco, GigaSpaces, and C24. Register here:  http://bit.ly/IpgpaN
  • O'Reilly Velocity, the Web Performance and Operations conference, is happening in Santa Clara, CA from June 25-27. Learn from your peers, exchange ideas with experts, and share best practices and lessons learned. Register here.
  • Sign up for this free 30-minute webinar exploring how new technology can determine which ads have been seen by users and will discuss the C3 Metrics Labs analysis of over 2 billion impressions. 

Cool Products and Services

  • Reality Check Network offers powerful hosting solutions and managed servers for high traffic/bandwidth websites backed by unlimited network, server and application support.
  • When you’re looking for the fastest, lightest, most complete toolset for rapidly building high performance Web 2.0 applications, you want NetAdvantage for ASP.NET.
  • Create your most stunning, highly performant, and completely mobile HTML5 applications and dashboards on any browser, platform or device – only with NetAdvantage for jQuery.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free.  No sign-up required. http://aicache.com/deploy
  • ElasticHosts award winning cloud server hosting launches across North America. Adding data centers in Los Angeles and Toronto. Free trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
May072012

Startups are Creating a New System of the World for IT

It remains that, from the same principles, I now demonstrate the frame of the System of the World. -- Isaac Newton

The practice of IT reminds me a lot of the practice of science before Isaac Newton. Aristotelianism was dead, but there was nothing to replace it. Then Newton came along, created a scientific revolution with his System of the World. And everything changed. That was New System of the World number one.

New System of the World number two was written about by the incomparable Neal Stephenson in his incredible Baroque Cycle series. It explores the singular creation of a new way of organizing society grounded in new modes of thought in business, religion, politics, and science. Our modern world emerged Enlightened as it could from this roiling cauldron of forces.

In IT we may have had a Leonardo da Vinci or even a Galileo, but we’ve never had our Newton. Maybe we don't need a towering genius to make everything clear? For years startups, like the frenetically inventive age of the 17th and 18th centuries, have been creating a New System of the World for IT from a mix of ideas that many thought crazy at first, but have turned out to be the founding principles underlying our modern world of IT.

If you haven’t guessed it yet, I’m going to make the case that the New System of the World for IT is that much over hyped word: cloud. I hope to show, using many real examples from real startups, that the cloud is built on a powerful system of ideas and technologies that make it a superior model for delivering IT.

IT has had an explosion of creativity: open source, deep and powerful tool chains, lean and agile development, cloud computing, virtualization, BigData, parallel programming, distributed monitoring, distributed programming, NoSQL, cost driven programming, dynamic languages, real-time processing, asynchronous programming, distributed teams, mobile platforms, viral loops, flat networks, software defined networking, wimpy cores, DevOps, everything as a service, infrastructure as code, and so on and so on. Astounding innovation wherever you look.

We are just now figuring out what new structures and systems are replacing the old, but if you step back a bit, what seems to be happening is we are creating a new “frame” using a bottom up methodology that just may be a new System of the World for IT. What is merging is a new way of working synthesised from all the diverse forces catalogued above. We’ve created a sort of new physics of development in place of a collection of prescientific alchemical lore. 

Since it is startups tackling problems that can’t be solved using traditional methods, it is through them that we’ll explore this new System of the World or IT.

It’s Not All About the Cloud, but It’s Mostly About the Cloud

Click to read more ...

Friday
May042012

Stuff The Internet Says On Scalability For May 4, 2012

It's HighScalability Time:

  • Quotable quotes:
    • Richard Feynman: Suppose that little things behave very differently than anything big
    • @orgnet: "Data, data everywhere, but not a thought to think" -- John Allen Paulos, Mathematician
    • @bcarlso: just throw out the word "scalability". That'll bring em out
    • @codypo: Here are the steps to the Scalability Shuffle. 1: log everything. 2: analyze logs. 3: profile. 4: refactor. 5: repeat.
    • @FoggSeastack: If math had been taught in a relevant way I might have been a  person today.
    • @secboffin: I know a programming joke about 10,000 mutexes, but it's a bit contentious.
  • Twitter gets personal with  Improved personalization algorithms and real-time indexing, a tale of a real-time tool chain. Earlybird is Twitter's real-time search system. Every Tweet has its URLs extracted and expanded. URL contents are fetched via SpiderDuck. Cassovary, a graph processing library, is used to find important connections. Then Twitter's search engine is used to find URLs shared in the circle of important connections. Those links are converted into stories and the stories are ranked by tweet frequency. A lot of stuff going on in near-real time.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
May022012

12 Ways to Increase Throughput by 32X and Reduce Latency by  20X

Martin Thompson, a high-performance technology geek, has written an awesome post, Fun with my-Channels Nirvana and Azul Zing. In it Martin shows the process and techniques he used to take an existing messaging product, written in Java, and increase throughput by 32X and reduce latency by  20X. The article is very well written with lots of interesting details that make it well worth reading.

The Techniques

Click to read more ...

Monday
Apr302012

Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached

The EuroSys 2012 system conference has an excellent live blog summary of their talks for: Day 1, Day 2, Day 3 (thanks Henry at the Paper Trail blog). Summaries for each of the accepted papers are here.

One of the more interesting papers from a NoSQL perspective was Cache Craftiness for Fast Multicore Key-Value Storage, a wonderfully detailed description of the low level techniques used to implement Masstree:

A storage system specialized for key-value data in which all data fits in memory, but must persist across server restarts. It supports arbitrary, variable-length keys. It allows range queries over those keys: clients can traverse subsets of the database, or the whole database, in sorted order by key. On a 16-core machine Masstree achieves six to ten million operations per second on parts A–C of the Yahoo! Cloud Serving Benchmark benchmark, more than 30 as fast as VoltDB [5] or MongoDB [2].

If you are looking for innovative detailed high performance design, this paper is for you. An example from the section on writer-writer coordination: 

Masstree writers coordinate using per-node spinlocks. A node’s lock is stored in a single bit in its version counter. Any modification to a node’s keys or values requires holding the node’s lock. Some data is protected by other nodes’ locks, however. A node’s parent pointer is protected by its parent’s lock, and a border node’s prev pointer is protected by its previous sibling’s lock. This minimizes the simultaneous locks required by split operations; when an interior node splits, for example, it can assign its children’s parent pointers without obtaining their locks.

Here's the live blog writeup:

Click to read more ...

Friday
Apr272012

Stuff The Internet Says On Scalability For April 27, 2012

It's HighScalability Time:

  • Quotable quotes:
    • The unbearable cost of the cloud? @marcoarment: My experiment with Amazon CloudSearch was going very well until I imported half of subscribers' bookmarks and saw it would cost $10k/month.
    • @JBossMike: Serious question: Do the Node.js hipsters enjoy not knowing WTF they're talking about when they talk about software scalability?
    • @timoelliott: The #Fashion industry doesn't have #bigdata -- they call it size XXL data instead...
  • Some steamy employee policies at Valve: no managers, 100% self-directed projects, crepe flat organizational structure, completely self owned.
  • Good discussion on the advisability of early stage ventures developing APIs that must be supported forever. Charlie Kindel says Don’t Build APIs, unil you get more experience with your product you'll just get them wrong and be stuck with suppoprt. Pretty much everyone else disagrees, thinking the value of APIs outshines the risk.  Insightful discussion on the Guerrilla Capacity Planning group.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Apr262012

Akaros - an open source operating system for manycore architectures

If you are interested in future foward OS designs then you might find Akaros worth a look. It's an operating system designed for many-core architectures and large-scale SMP systems, with the goals of:

  • Providing better support for parallel and high-performance applications
  • Scaling the operating system to a large number of cores 

A more indepth explanation of the motiviation behind Akaros can be found in Improving Per-Node Efficiency in the Datacenter with NewOS Abstractions by Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer.

The abstract:

We believe datacenters can benefit from more focus on per-node efficiency, performance, and predictability, versus the more common focus so far on scalability to a large number of nodes. Improving per-node efficiency decreases costs and fault recovery because fewer nodes are required for the same amount of work. We believe that the use of complex, general-purpose operating systems is a key contributing factor to these inefficiencies.

 

Traditional operating system abstractions are ill-suited for high performance and parallel applications, especially on large-scale SMP and many-core architectures. Datacenter nodes ought to run customized operating systems to help applications to utilize the full potential of the underlying hardware. We propose four key ideas that help to overcome these limitations. These ideas are built on a philosophy of exposing as much information to applications as possible and giving them the tools necessary to take advantage of that information to run more efficiently. In short, high-performance applications need to be able to peer through layers of virtualization in the software stack to optimize their behavior. We explore abstractions based on these ideas and discuss how we build them in the context of a new operating system called Akaros. 

Wednesday
Apr252012

The Anatomy of Search Technology: blekko’s NoSQL database

This is a guest post (part 2, part 3) by Greg Lindahl, CTO of blekko, the spam free search engine that had over 3.5 million unique visitors in March. Greg Lindahl was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

Imagine that you're crazy enough to think about building a search engine.  It's a huge task: the minimum index size needed to answer most queries is a few billion webpages. Crawling and indexing a few billion webpages requires a cluster with several petabytes of usable disk -- that's several thousand 1 terabyte disks -- and produces an index that's about 100 terabytes in size.

Serving query results quickly involves having most of the index in RAM or on solid state (flash) disk. If you can buy a server with 100 gigabytes of RAM for about $3,000, that's 1,000 servers at a capital cost of $3 million, plus about $1 million per year of server co-location cost (power/cooling/space.) The SSD alternative requires fewer servers, but serves a lot fewer queries per second, because SSDs are much slower than RAM.

You might think that Amazon's AWS cloud would be a great way to reduce the cost of starting a search engine. It isn't, for 4 main reasons: 

Click to read more ...