Saturday
Apr052008
Skype Plans for PostgreSQL to Scale to 1 Billion Users

Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.
- Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key.
- For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash.
- Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:
First, code to insert a user in a database:
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
BEGIN
PERFORM 1 FROM users WHERE username = i_username;
IF NOT FOUND THEN
INSERT INTO users (username) VALUES (i_username);
RETURN 'user created';
ELSE
RETURN 'user already exists';
END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Heres the proxy code to distribute the user insert to the correct partition:
queries=#
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
CLUSTER 'queries'; RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;
Your SQL query looks normal:
SELECT insert_user("username");
- The result of a query is exactly that same as if was executed on the remote database.
- Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
- PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them.
- More partitions can be added to improve performance.
- Only data on a failed partition is unavailable during a failover. All other partitions operate normally.
- Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains."
- To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools.
The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:
The downsides are:
It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.
Reader Comments (8)
With hashing how does Skype handle re-partitioning (e.g. when adding or taking DBs)
I think it's this one: http://tinyurl.com/5fl8hu
plProxy allows for any hashing algorithm. The examples use the default PostgreSQL hashtext() function, per a previous comment. You can however roll your own if you're so inclined. plProxy is very flexible by its nature. The configuration and utilization is all done through stored procedures/functions.
So they are taking their figure to one billion users. I guess it would be hardly around 5 years when skype would defintly go for a trillion or even more as the way internet is grooming..
-----
http://underwaterseaplants.awardspace.com">sea plants
http://underwaterseaplants.awardspace.com/seagrapes.htm">Sea grapes...http://underwaterseaplants.awardspace.com/plantroots.htm">Plant roots
The above comment about trillion users is funny because there are only 6.7 billion people in the world right now and the question is not if skype can support a trillion users but will our planet support a trillion people :D
good article,
we have now 2 x PgBouncer, 12 x nodes and 12 x WAL backup servers and everything works fine.
the servers are 1u twin, dual quad core intel xeon with 32gb ram and 2 x sas in hardware raid1 on each twin.
---
http://www.unixvps.com
I woluld like to know why you've chosen PostgreSQL instead of for example MySQL?
If you need anything transactional (not MyISAM) then PostgreSQL scales far better then MySQL with InnoDB.
I guess also that some people within Skype knew PG better then MySQL and PG has far better support for stored procedures, triggers etc,
Ries