High Scalability -

Recommend Building a Social Music Service Using AWS, Scala, Akka, Play, MongoDB, and Elasticsearch (Email)

This action will generate an email recommending this article to the recipient of your choice. Note that your email address and your recipient's email address are not logged by this system.

Email Article Link

The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.

Article Excerpt:

This is a guest repost by Rotem Hermon, former Chief Architect for serendip.me, on the architecture and scaling considerations behind making a startup music service.

serendip.me is a social music service that helps people discover great music shared by their friends, and also introduces them to their “music soulmates” - people outside their immediate social circle that shares a similar taste in music.

Serendip is running on AWS and is built on the following stack: scala (and some Java), akka (for handling concurrency), Play framework (for the web and API front-ends), MongoDB and Elasticsearch.

Choosing the stack

One of the challenges of building serendip was the need to handle a large amount of data from day one, since a main feature of serendip is that it collects every piece of music being shared on Twitter from public music services. So when we approached the question of choosing the language and technologies to use, an important consideration was the ability to scale.

The JVM seemed the right basis for our system as for its proven performance and tooling. It's also the language of choice for a lot of open source system (like Elasticsearch) which enables using their native clients - a big plus.

When we looked at the JVM ecosystem, scala stood out as an interesting language option that allowed a modern approach to writing code, while keeping full interoperability with Java. Another argument in favour of scala was the akka actor framework which seemed to be a good fit for a stream processing infrastructure (and indeed it was!). The Play web framework was just starting to get some adoption and looked promising. Back when we started, at the very beginning of 2011, these were still kind of bleeding edge technologies. So of course we were very pleased that by the end of 2011 scala and akka consolidated to become Typesafe, with Play joining in shortly after.

MongoDB was chosen for its combination of developer friendliness, ease of use, feature set and possible scalability (using auto-sharding). We learned very soon that the way we wanted to use and query our data will require creating a lot of big indexes on MongoDB, which will cause us to be hitting performance and memory issues pretty fast. So we kept using MongoDB mainly as a key-value document store, also relying on its atomic increments for several features that required counters.
With this type of usage MongoDB turned out to be pretty solid. It is also rather easy to operate, but mainly because we managed to avoid using sharding and went with a single replica-set (the sharding architecture of MongoDB is pretty complex).

For querying our data we needed a system with full blown search capabilities. Out of the possible open source search solutions, Elasticsearch came as the most scalable and cloud oriented system. Its dynamic indexing schema and the many search and faceting possibilities it provides allowed us to build many features on top of it, making it a central component in our architecture.

We chose to manage both MongoDB and Elasticsearch ourselves and not use a hosted solution for two main reasons. First, we wanted full control over both systems. We did not want to depend on another element for software upgrades/downgrades. And second, the amount of data we process meant that a hosted solution was more expensive than managing it directly on EC2 ourselves.

Some numbers

Article Link:

Your Name:

Your Email:

Recipient Email:

Message: