« Lazy web sites run faster | Main | Development of highly scalable web site »
Saturday
Apr052008

Skype Plans for PostgreSQL to Scale to 1 Billion Users

Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.

  • Skype's goal is an architecture that can handle 1 billion plus users. This level of scale isn't practically solvable with one really big computer, so our masked superhero horizontal scaling comes to the rescue.
  • Hardware is dual or quad Opterons with SCSI RAID.
  • Followed common database progression: Start with one DB. Add new databases partitioned by functionality. Replicate read-mostly data for better read access. Then horizontally partition data across multiple nodes..
  • In a first for this blog anyway, Skype uses a traditional database architecture where all database access is encapsulated in stored procedures. This allows them to make behind the scenes performance tweaks without impacting frontend servers. And it fits in cleanly with their partitioning strategy using PL/Proxy.
  • PL/Proxy is used to scale the OLTP portion of their system by creating a horizontally partitioned cluster:

    - Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key.
    - For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash.
    - Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:

    First, code to insert a user in a database:
    CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
    BEGIN
    PERFORM 1 FROM users WHERE username = i_username;
    IF NOT FOUND THEN
    INSERT INTO users (username) VALUES (i_username);
    RETURN 'user created';
    ELSE
    RETURN 'user already exists';
    END IF;
    END;
    $$ LANGUAGE plpgsql SECURITY DEFINER;

    Heres the proxy code to distribute the user insert to the correct partition:
    queries=#
    CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
    CLUSTER 'queries'; RUN ON hashtext(i_username);
    $$ LANGUAGE plproxy;

    Your SQL query looks normal:
    SELECT insert_user("username");

    - The result of a query is exactly that same as if was executed on the remote database.
    - Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
  • They like PL/Proxy approach for OLTP because:
    - PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them.
    - More partitions can be added to improve performance.
    - Only data on a failed partition is unavailable during a failover. All other partitions operate normally.
  • PgBouncer is used as a connection pooler for PostgreSQL. PL/Proxy "somewhat wastes connections as it opens connection to each partition from each backend process" so the pooler helps reduce the number of connections.
  • Hot-standby servers are created using WAL (Write Ahead Log) shipping. It doesn't appear that these servers can be used for read-only operations.
  • More sophisticated organizations often uses an OLTP database system to handle high performance transaction needs and then create seperate systems for more non-transactional needs. For example, an OLAP (Online analytical processing) system is often used for handling complicated analysis and reporting problems. These differ in schema, indexing, etc from the OLTP system. Skype also uses seperate systems for the presentation layer of web applications, sending email, and prining invoices. This requires data be moved from the OLTP to the other systems.
    - Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains."
    - To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools.

    The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:
  • Applications are independent of the structure of the database servers. That's encapsulated in the proxy servers.
  • Applications do not need to change in response to partition, mapping, or other changes.
  • Load balancing, failover, and read/write splitting are invisible to applications.

    The downsides are:
  • Reduced performance. Another hop is added and queries must be parsed to perform all the transparent magic.
  • Inability to perform joins and other database operations across partitions.
  • Added administration complexity of dealing with proxy configuration and HA for the proxy servers.

    It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.

    Related Articles

  • An Unorthodox Approach to Database Design : The Coming of the Shard
  • PostgreSQProducts - Scaling infinitely with PL/Proxy
  • PL/Proxy
  • Heroku also uses PostgreSQL.
  • MySQL Proxy
  • PostgreSQL cluster: partitioning with plproxy (part I) by Kristo Kaiv'.
  • PostgreSQL cluster: partitioning with plproxy (part II) by Kristo Kaiv'.
  • PostgreSQL at Skype.
  • Skytools database scripting framework & PgQ by Kristo Kaiv'.
  • PostgreSQL High Availability.
  • Reader Comments (8)

    With hashing how does Skype handle re-partitioning (e.g. when adding or taking DBs)

    December 31, 1999 | Unregistered CommenterAnonymous

    I think it's this one: http://tinyurl.com/5fl8hu

    December 31, 1999 | Unregistered CommenterTodd Hoff

    plProxy allows for any hashing algorithm. The examples use the default PostgreSQL hashtext() function, per a previous comment. You can however roll your own if you're so inclined. plProxy is very flexible by its nature. The configuration and utilization is all done through stored procedures/functions.

    December 31, 1999 | Unregistered CommenterGavin M. Roy

    So they are taking their figure to one billion users. I guess it would be hardly around 5 years when skype would defintly go for a trillion or even more as the way internet is grooming..
    -----
    http://underwaterseaplants.awardspace.com">sea plants
    http://underwaterseaplants.awardspace.com/seagrapes.htm">Sea grapes...http://underwaterseaplants.awardspace.com/plantroots.htm">Plant roots

    December 31, 1999 | Unregistered Commenterfarhaj

    The above comment about trillion users is funny because there are only 6.7 billion people in the world right now and the question is not if skype can support a trillion users but will our planet support a trillion people :D

    December 31, 1999 | Unregistered CommenterAnonymous

    good article,

    we have now 2 x PgBouncer, 12 x nodes and 12 x WAL backup servers and everything works fine.
    the servers are 1u twin, dual quad core intel xeon with 32gb ram and 2 x sas in hardware raid1 on each twin.

    ---
    http://www.unixvps.com

    December 31, 1999 | Unregistered Commenterunixvps

    I woluld like to know why you've chosen PostgreSQL instead of for example MySQL?

    December 31, 1999 | Unregistered Commenternc

    If you need anything transactional (not MyISAM) then PostgreSQL scales far better then MySQL with InnoDB.

    I guess also that some people within Skype knew PG better then MySQL and PG has far better support for stored procedures, triggers etc,

    Ries

    December 31, 1999 | Unregistered CommenterRies van Twisk

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>