Zoosk - The Engineering behind Real Time Communications
Monday, August 27, 2012 at 9:15AM
HighScalability Team in Example

This is a guest post by Peter Offringa, VP of engineering at Zoosk. Zoosk is a 50 million member romantic social network.

Our members get the most rewarding experience from Zoosk when they can interact in real-time. After all, a future relationship is potentially at the other end of every connection a user makes. The excitement and richness of this situation can only be fully realized in real-time. The suite of Zoosk services facilitating these interactions are referred to by the general description of real-time communications (RTC). These communications are delivered using the XMPP protocol, which also powers other popular instant messaging products. Zoosk members experience real-time communications within three distinct interactions:

These communications are currently delivered to users on all major Zoosk products – the Zoosk.com site and Facebook app through a web browser, the iPhone app, iPad, Android, and a downloadable desktop application.

RTC Infrastructure

These RTC services are delivered through a highly performant and scalable XMPP-based infrastructure. The chat serve, powered by the open source Jabber server, Tigase, is the heart of this service. Tigase is written in Java, and our Platform team has created a number of custom extensions which handle Zoosk specific business logic.

Tigase is deployed on standard 8 CPU, Linux-based application server class machines. The Tigase servers are configured in paired clusters, with a primary and secondary node managed through a load balancer. All connections are directed to the primary node at a single time. If the service check to the primary server fails, the load balancer will immediately begin re-directing user traffic to the secondary server.

There are 18 of these paired clusters, each handling 4,000 to 8,000 connections at any time. In addition to socket connections for transmitting XMPP traffic, Tigase also includes a service for supporting BOSH connections over HTTP.

BOSH is the protocol by which we allow the web browser surfing Zoosk.com and our Facebook app to maintain a persistent connection to Tigase. Our desktop application and mobile apps use standard TCP-IP socket connections.

Full Size

A user’s online state is tracked in real-time by the Tigase servers via persistent connections between Tigase and the client applications (web browser, mobile device, desktop application).  Many core Zoosk product features, including search results, profile views and messaging, require ensuring that this state is reflected in near real-time on all client applications. To keep this state consistent throughout the rest of the Zoosk infrastructure, the user’s record in the user database is updated to reflect their current online state including a timestamp of their most recent online transition.

The user’s online state is also stored in cache on our search infrastructure, so that search results can take online state into account. Zoosk search functionality is powered by a tier of SOLR servers. We have extended each SOLR server to include an ehcache instance to store those users who are online currently. This cache of online state is updated in real-time through a dedicated Tigase instance referred to as the Online State Manager (OSM).

The OSM receives custom XMPP packets indicating the user’s online state from the primary Tigase chat servers and then makes a network call to update the ehcache instance on each of the SOLR servers. There are roughly 8,000 of these online state transitions a minute during peak traffic.  Maintaining this cache outside of the SOLR index allows the user’s presence state to be updated in real-time, separate from the periodic index replication snaps from master to slave. The user’s presence state is then combined with search results at query time to either filter or rank results based on whether the user is online currently. The search algorithm prefers users who are online, as this encourages real-time communication and provides a richer experience for other users.
 

User interactions with the Zoosk service outside of the core RTC features can also trigger business logic that generates a real-time notification to a connected user. For example, if another user views our user’s profile, or accepts our user’s friend request, we want to notify our user of that action immediately. The PHP-based web application will trigger an asynchronous job that opens a network connection to a Tigase server and passes an XMPP data packet to the server, with a custom message payload providing the data for the notification. This packet is processed by Tigase and routed to the client application from which user is currently connected.

The user’s client application then processes this custom packet and displays the appropriate “toast” to the user or updates a “badge” reflecting the current value of a particular feature indicator (number of profile views, unread messages, etc.). If the user is offline at the time, Tigase will store the packet until the user reconnects. At which point, it will pass the custom packet to the user’s client application.

Monitoring and Testing

The Zoosk technical operations team has built a number of ways to test and monitor the health of the RTC infrastructure to ensure responsiveness and availability. These tests primarily involve various mechanisms to gather performance data from Tigase servers, or to simulate real user interactions. If a particular health check fails or performance data falls outside of established thresholds, our Nagios installation will generate an alert.

Full Size
What’s Next

Looking forward, we will continue to actively explore new ways to leverage the real-time experience for Zoosk members. We will be rolling out RTC support to our mobile web application (Touch) in the next month.  Other devices or mediums that deliver the Zoosk application will similarly be connected in real-time. As our members increase the amount of time they are actively connected to Zoosk applications, we plan to enhance our RTC-based features to facilitate easier discovery and communication between members.

 

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.