« MONO ASP.NET. Will it make the web??? | Main | Virus Scanning for Uploaded content »
Thursday
Jan102008

Letting Clients Know What's Changed: Push Me or Pull Me?


I had a false belief
I thought I came here to stay
We're all just visiting
All just breaking like waves
The oceans made me, but who came up with me?
Push me, pull me, push me, or pull me out .

So true Perl Jam (Push me Pull me lyrics), so true. I too have wondered how web clients should be notified of model changes. Should servers push events to clients or should clients pull events from servers? A topic worthy of its own song if ever there was one.

To pull events the client simply starts a timer and makes a request to the server. This is polling. You can either pull a complete set of fresh data or get a list of changes. The server "knows" if anything you are interested in has changed and makes those changes available to you. Knowing what has changed can be relatively simple with a publish-subscribe type backend or you can get very complex with fine grained bit maps of attributes and keeping per client state on what I client still needs to see.

Polling is heavy man. Imagine all your clients hitting your servers every 5 seconds even if there are no updates. And if every poll request ends up in a flurry of database requests your database can be hammered. Of course, caching can smooth out this jagged trip, but if you keep per client state you need more clever per client cache views. The overhead of polling can be mitigated somewhat by piggy backing updates on replies to client requests.

So if polling has a high overhead then it makes sense to only send data when there's an update the client should see. That is, we push data to the client. The current push model favorite is Comet: a World Wide Web application architecture in which a web server sends data to a client program (normally a web browser) asynchronously without any need for the client to explicitly request it. It allows creation of event-driven web applications, enabling real-time interaction otherwise impossible in a browser.

Nothing comes for free however and pushing has a surprising amount of overhead too. A connection has to be kept open between the client and server for the new data to pushed over. Typically servers don't handle large tables of connections very well so this approach hasn't worked well. You had to spread the connections over multiple servers. Fortunately operating systems are getting better at handling large numbers of connections.

For every connection you also have to store the data to push to the client and you need a thread to send it. It's easy to see how this could go bad with naive architectures.

Architecturally I've always sided on polling for complete datasets rather than pushing or polling just for changes. This is the simplest and best self-healing architecture. Machines can go up and down at will and your client will always be correct and consistent. There's no chance for the stream of changes to get out of sync. Your client view will always be correct. The server side doesn't have to do anything too special. Clients already know how to do it. And you use client resources to do the polling and the update on the client side.

All you have to do to scale polling is have enough machines, smart caching to handle the load, enough bandwidth to handle larger datasets, and a problem where low latency isn't required. That's all :-)

The Comet Daily, not affiliated with Super Man I hear, is making a strong case for push in their articles Comet is Always Better Than Polling and 20,000 Reasons Why Comet Scales.

Special application server software is needed because your typical app server can't handle lots of persistent connections. They tend to run out of threads and memory. Greg Wilkins talks about these and other issues in Blocking Servlets, Asynchronous Transport. This is all pretty standard stuff when you build your own messaging system, but I guess it has taken a while to move into the web infrastructure.

With Comet they found:
The key result is that sub-second latency is achievable even for 20,000 users. There is an expected latency vs. throughput tradeoff: for example, for 5,000 users, 100ms latency is achievable up to 2,000 messages per second, but increases to over 250ms for rates over 3,000 messages per second.

Interesting results, especially if your application requires low latency updates. Most people haven't deployed or even considered push based architectures. With Comet it's at least something to think about.

I can't resist adding this cute animation of a
llama push me pull me.

Reader Comments (14)

There's something to be said for using a signalling method such as RSS to signal new data and let the client poll for the data after the signal has been received.

December 31, 1999 | Unregistered CommenterSimon Cast

For an active on screen web session do you think RSS is a good fit? Latency is increased and you are about doubling the number of messages.

December 31, 1999 | Unregistered CommenterTodd Hoff

I'd much rather just set up a 1x1 flash movie that connects to a Red5 or FMS server than do polling of any sort. You require Flash, but hey, welcome to the internet - it's everywhere.

December 31, 1999 | Unregistered CommenterBrent

With the advent of dedicated HTTP push servers the supposed overhead of idling connections is non-existent. Using edge-triggered network IO libraries like epoll or kqueue, there is no additional overhead for an idling connection. You only pay for actual throughput. At that point, you want the least throughput cost per client for your application, which will always be push.

December 31, 1999 | Unregistered CommenterMichael Carter

RSS is probably not suitable for a realtime system. I think RSS for signalling works when doing regularly data dumps and similar procedures.

December 31, 1999 | Unregistered CommenterSimon Cast

This is a problem that seems to come up again and again. HTTP is request-response, and pretty much all the infrastructure that deals with HTTP is geared to client-server. This is the architecture of the web as we know it. I'm not against having a server notify clients of changes, but there's a lot of new problems that arise from turning the conversation around.
I have come across this with creating a multiplayer card-game, where any player might add cards to the table at any time. Polling in this case has to be really fast (more than once a second), which gave the server some serious trouble keeping up, even when there were a handfull requests.

There's a solution to this, however, and that is to use something else than HTTP (we used Jabber), that does allow for server-client communication. That's at least what we did. And we got chat, multi-user chat, federated logins and a lot of other niceties on the way, almost for free. And one small Flash implementation of the client gave us the ability to get out to all the web clients around.

My real point is that rather than trying to get HTTP to do things it was not designed to do, use another protocol!

December 31, 1999 | Unregistered CommenterKyrre

Kyrre,

I dont have too much experience in this field but i started playing with hidden frames, flash, java applets around 8-10 years ago. At that time we called it "The middle man". We used IRC as the protocol. So I do agree with you.

best regards

December 31, 1999 | Unregistered Commenteratif.ghaffar

Kyrre: Its short-sighted to suggest that HTTP Push means you are stuck with HTTP style interaction and nothing else. It is simply a means of transporting data from the server to the client. There is absolutely *no reason* you can't implement a jabber client in javascript using HTTP push and XHR for communication. The real question is, should you use HTTP push for the transport, or Flash for the transport. After you make that decision, then you implement your IRC or Jabber client on top of your transport. If you use HTTP push, then you get an implementation that works on all platforms, doesn't require external runtimes, loads faster, and gets through all firewalls. And you still get the niceties you speak of, almost for free.

December 31, 1999 | Unregistered CommenterMichael Carter

Michael: I'm not suggesting using HTTP Push limits you to HTTP-style interaction and nothing else. Nor am I saying that creating jabber clients in JavaScript is not possible. I'm just saying that doing so is protocol abuse. You're basically reducing HTTP to a convenient way to establish sockets and keeping them open. Making HTTP do things it was not created for instead of either using another protocol or supporting a definition of a new one is a step back, not forward. HTTP has had a lot of success on the basis of it being relatively simple to implement the bare-bones, but with enough advanced features to supply more complex features to allow the client and server to be able to agree on a common communication strategy (Accept, Accept*, Vary, etc). HTTP is also open enough to allow for the creation of custom headers for extension points, but leaving a socket open for the server to push data back to the client adds new problems that should either be facilitated through a new version of the existing protocol or a completely new one.

December 31, 1999 | Unregistered CommenterKyrre

Apart from the fact that pearl jam fans should be punched into their faces for being such emo fags ;-) (just kidding), a great post!

Congratulations!

December 31, 1999 | Unregistered CommenterAnonymous

Kyrre: The crux of the problem with using another transport protocol is that it won't work on all systems, and it may not make it through all firewalls. The extent of your argument seems to be that "No one intended HTTP to be used for stateful, asynchronous communication. Maybe a bad thing will happen if some servers are created that use HTTP for that purpose."

Keep in mind, None of the web was meant as an application framework. HTML is completely oriented towards content. Every site I visit these days is a gross abuse of the HTML and HTTP specification.

I will grant you that we need to standardize push communication, and that is what the Bayeux protocol is attempting. But its ridiculous to suggest that we shouldn't build servers specifically optimized for this type of communication. We also have the definition of "server sent events" in the html5 spec which specifies a good way of using HTTP to push events. And opera already supports this standard.

December 31, 1999 | Unregistered CommenterMichael Carter

I was a bit to harsh in my last post, I really apologize if I offended anyone.

Michael: the crux of the problem is that your requirements have changed, and a lot of new constraints have to be made. I'm not trying to say that that Armageddon will be upon us if we open for callback style, but is it still a good idea? I really don't think so.

As of every site you visit is a gross abuse of HTML and HTTP, that really puzzles me. On the HTML case, I'm 100% with you. As for HTTP, I really cannot find on single good example of this. This is because HTTP allows for new special-purpose headers (supporting new semantics) and even methods. This is one of the great strengths of HTTP. HTML is a completely different beast. I believe that a lot of people (not all, of course) that move from "basic" web design to AJAX-style sites do this because of the lack of support for DELETE and PUT in HTML forms. The HTML 5 proposition is really just another point for this.

I really hope that you continue making servers optimized for push communication, it's a good way of learning for any standardization attempts.

I'm not trying to cut off all Comet attempts by claiming it's a bad idea (again, the last post was a bit over the top), but what makes me really queezy when I look at Comet and related technologies is that they're discussing scalability on the "how many sockets can I handle" or "can I scale horizontally", and not about all the stuff that's in between (and I'm really not talking firewalls here, proxies,caching, CDN-type solutions, etc) .
I also worry about loosing statelessness, the thing that allows you to loose socket connections or even network, and still be able to read the content, and to download more once you regain connection.

December 31, 1999 | Unregistered CommenterKyrre

> I also worry about loosing statelessness

Will you need to lose this though? My naive assumption is work is being queued and persisted for each client so if a connection breaks the client can reestablish a session and pick up where they left off. If the data is simply transient then this would be a problem.

December 31, 1999 | Unregistered CommenterTodd Hoff

but what makes me really queezy when I look at Comet and related technologies is that they're discussing scalability on the "how many sockets can I handle" or "can I scale horizontally", and not about all the stuff that's in between (and I'm really not talking firewalls here, proxies,caching, CDN-type solutions, etc) .

You're right -- proxies must be optimized for this, or not used. Caching doesn't make any sense in this context though, nor does CDN-type solutions.

Will you need to lose this though? My naive assumption is work is being queued and persisted for each client so if a connection breaks the client can reestablish a session and pick up where they left off. If the data is simply transient then this would be a problem.

Its a good question. The Bayeux protocol treats data as transient, so it is a problem. I am attempting to address that issue in a recent series of articles. You can see my latest post here: http://cometdaily.com/2008/02/08/colliding-comets-battle-of-the-bayeux-part-2/ The series may also be of interest to you because I discuss a couple of reservations I have about scalability of some classes of applications that use Bayeux for their communication protocol.

December 31, 1999 | Unregistered CommenterAnonymous

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>