« Ebay history and architecture | Main | Performance - When do I start worrying? »
Monday
Mar302009

Lavabit Architecture - Creating a Scalable Email Service

Ladar Levison of Lavabit has written an incredible article on how they took a centralized off-the-shelf email server that could handle only few thousand users and built their own custom distributed infrastructure for handling hundreds of thousands of email users. Lavabit processes 70 gigabytes of data per day, is made up of 26 servers, hosts 260,000 email addresses, and processes 600,000 emails a day. That's a lot of email.

Lavabit's mission has a little edge to it too:

Lavabit was founded as a direct reaction to the larger free e-mail services available. We felt it was possible to create an e-mail service that was fast, reliable, feature rich and didn't achieve profitability by prostituting its user base to marketers.

What I really like about this article is that Lavabit has some challenging elements in dealing with different email protocols while being able to scale to a lot of users. There's more going on than just trying to scale out a database. Many products contain complicated bits like this, so it's interesting to see how Ladar handled them. There are lots of useful details that will help anyone build their own system. Putting in this extra work in is what Ladar thinks makes Lavabit different:

One of the ways to gain an advantage over your competition is to invest the time and money needed to build systems that are better than what is easily available to your competition. It is the custom platform we developed that has allowed us to thrive while many other free email companies either stopped offering their service for free, or shut down altogether.

Since Ladar was so thorough I saved article as a separate html file. Please select the visit link to read the entire article. I'd like to thank Ladar again for taking the time and making the effort to document their architecture for the benefit of the community at large to learn from.

Reader Comments (9)

Ahem, 260K user accounts and 600K messages/day is a _small_ installation.

Scaling up a mail system is easy. Making it reliable is harder. You have to scale up/out a few essential components:

(a) The frontend MX systems.
(b) The mail spool, where messages are actually stored.
(c) The POP3 (and IMAP) services.
(d) User information databases
(e) Custom frontend systems, if any.

Scaling out MX systems is easy. They accept mail from remote systems, make various anti-spam/anti-malware checks and send mail onwards.

One trick I recommend here is to rewrite the SMTP envelope recipient address to an address which only works within the cluster. This allows you to use MX records within the cluster as well for message routing.

Scaling out storage is easy. Add more independent storage nodes. The storage nodes will block on disk io, and not on space. The fix is to use lots of small, fast disks. Mail is routed to each independent set of disks via MX records in DNS. Making these HA is harder. RAID helps. Also storage detached from a physical host, so you can have a standby server to take up the load.

User information can be dealt with in multiple ways:
(a) Single master, multiple slave replication is the easiest with a RDBMS or LDAP.
(b) You could write some custom code to export data from LDAP/RDBMS to CDB tables which provide fast local lookups and don't involve connection blocking/pooling.
(c) A combination of the two.

Frontend systems are just as easy to scale out, since they are essentially stateless. Add another box to scale out.

December 31, 1999 | Unregistered CommenterDevdas Bhagat

26 servers for 600k emails daily? Meh, I was doing that 6 years ago on one cyrus server and two for webmail.

Cyrus really rocks for high volume setups. Maybe you should ask fastmail.fm folks about their magic.

December 31, 1999 | Unregistered Commenterpegasus

That 600,000 email number is questionable. Is that only the legit mail?

Thanks for providing this site.

December 31, 1999 | Unregistered Commenterrob

I wrote the article so I can field the questions above. Todd took the number of messages we accept (200k) plus the number of messages we reject (400k) to come up with 600k messages per day. We also mentioned that we have 26 servers, but only 14 are dedicated to mail, and of those 10 are Dell 1650 app servers, and interchangeable.

Devdas is wrong when he says 260k users is a small installation. Of all the mail setups worldwide, surely less than 1% have 260k users or more. But Devdas is right when he says making a mail system reliable is hard. Even with the setup he proposed, if one of the storage nodes goes down, all of the users assigned to that node are out of luck. His proposed system is also complex, with at least 4-5 critical parts (that I can count).

Fastmail uses a Cyrus setup, and yes Cyrus does rock. There are some good articles on the Fastmail blog that interested readers may want to read. We just decided to go a different direction with our system.

December 31, 1999 | Unregistered Commenterladar

Why not split the password from the encryption key? Seems like you could auto-generate the encryption key from the password and then store that as well, allowing you to support Secure Password and still ECC-encrypt all the emails in the same manner that you're doing now?

December 31, 1999 | Unregistered CommenterToby DiPasquale

Ladar, any mail system has those critical parts. The idea is to use loose coupling between all the moving parts so that you can
(a) Scale them up more easily
(b) Make the system more HA by adding stateless redundancy.

DNS is trivial to make redundant.

MX/antispam servers are easily cloned. RAID on the disk should minimise the chance of mail loss if the disk(s) go down (It is a non-zero number though).

Databases are usually easy to replicate in a Master/Slave configuration, multi-master is harder. Failover is also trivial in the master/slave setup.

The only place in my configuration where non-replicated state exists in the system is the mail spool, and you could setup DRBD or equivalent to have two independent devices for redundancy of the spool itself, along with RAID to make each node slightly more failure tolerant. This is the most complex part of the setup, due to the multiple different mechanisms for HA and the statefulness of the datastore.

Accessing the spool is stateless (NFS with Maildir is good for this).

As for 260K users being small, my previous gig was at a shop with ~ 33M accounts. 260K is bigger than the median, but on the low side of the mean.

December 31, 1999 | Unregistered CommenterDevdas Bhagat

Devdas,

Do you have some time for another gig.
I am looking for someone to help us build the next email system.

The current one is more or less like http://www.worldsoft-postmaster.info/

best regards

December 31, 1999 | Unregistered Commenteratif.ghaffar

Are you really sure you got the numbers right there? :-)

There are only 600k mail delivery attempts a day for 260k users?!

Over at perl.org/cpan.org/develooper.com/... we have ~800k incoming connections to our mail servers a day. The vast majority, of course, is spam, but still ...

- ask

December 31, 1999 | Unregistered CommenterAsk Bjørn Hansen

There are only 600k mail delivery attempts a day for 260k users?!

http://www.boediger.net/>boediger

December 31, 1999 | Unregistered Commenterboediger

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>