What should the architecture of your scalable, real-time, highly available service look like? There are as many options as there are developers, but if you are looking for a general template, this architecture as described by Prashant Malik, Facebook's lead for the Messages back end team, in Scaling the Messages Application Back End, is a very good example to consider.
Although Messages is tasked with handling 135+ billion messages a month, from email, IM, SMS, text messages, and Facebook messages, you may think this is an example of BigArchitecture and doesn't apply to smaller sites. Not so. It's a good, well thought out example of a non-cloud architecture exhibiting many qualities any mom would be proud of:
Qualities 1-7 are well known and fairly widely acknowledge and deployed in some form or another. The cell idea isn't as widely deployed as it ratchets up the abstraction level to 11.
Salesforce has a similar idea called a pod. For Salesforce each pod consists of 50 nodes, Oracle RAC servers, and Java application servers. Each pod supports many thousands of customers. We've seen other products bundle shards in a similar way. A shard would consist of a database cluster and storage that hides a master-slave or a master-master configuration into a highly available and scalable unit. Flickr is an early and excellent example of this strategy.
The really interesting bits of the article is how cluster management within a cell is handled and how the management of the cells as a unit is handled. That's the hard part to get right.
In a cluster of machines, nodes and thus the services on those nodes can twinkle in and out of existence at any time. Configuration can change also change at any time? How do you keep all systems in the cell coordinated? ZooKeeper. ZooKeeper is used for high availability, sharding, failover, and services discovery. All essential features for a cluster of fallible machines. Within a cell application servers form a consistent hash ring and users are shard across these servers.
What about cross cell operations, how are they handled? Messages has a Discovery Service that is a kind of User DNS, it sits above the cells and maintains a user to cell mappings. If you want to access user data you have to find the correct cell first using the service. In the cell a distributed logic client acts as the interface to the Discovery Service and processes ZooKeeper notifications.
The article is very well written and has a lot of detail. Well worth reading, especially if you are trying to figure out how to structure your own service.