Sunday
Oct072007
Paper: Architecture of a Highly Scalable NIO-Based Server

The article describes the basic architecture of a connection-oriented NIO-based java server. It takes a look at a preferred threading model, Java Non-blocking I/O and discusses the basic components of such a server.
Reader Comments (5)
This is a good paper. NIO is a fantastic capability; it's one of my favorite additions to the Java library.
There are a handful of subtle "gotcha's" about NIO, however. This article does a good job of explaining the threading issues with selectors and key registration.
If you're aiming for high throughput (i.e. high data rates for a single connection) as well as high scalability (large number of connections), then there's another nifty trick you have to apply.
The usual implementation of the Reactor pattern has the handler thread perform one read, process the buffer, then drop back into the dispatch loop.
This can lead to a TCP stall.
When a selector signals that a channel is ready to read, that means the socket's TCP buffer has received at least some data. By the time you dispatch to the handler thread, more data may have arrived. In fact, even while you read and process the first bufferful of data, more data could be arriving. But, by the time the selector gets re-selected and re-dispatched, the sender will probably have filled the receiver's TCP window. When that happens, the receiver tells the sender "window full" and the sender is required to stop sending data.
What my team found was that, in order to prevent TCP stalls, we needed to loop within the handler's method. Once the channel is marked as ready, we needed to keep reading one buffer after another until SocketChannel.read() returned 0, indicating that we had drained the socket's buffer. Only then would we let the thread go back to be dispatched again.
At first, this seemed strange, since we would occasionally see a transfer (4 - 6 MB) being completed in a single dispatch. Even though this seems like a degenerating into a thread-per-connection model, it isn't, because there was never a blocking read. So, we never had a thread sitting around doing nothing. Instead, the thread would be maximally busy. This also cured our TCP stalls and let us achieve maximum throughput.
Michael T. Nygard
michael@michaelnygard.com
http://www.michaelnygard.com/
Author of "Release It!"
http://pragmaticprogrammer.com/titles/mnee/index.html
Michael, isn't the downside of doing so much work in the handler that you've made your latency quite a bit less predictable which may make other connections fail because they aren't being processed fast enough? Or didn't you see this in profiling?
Todd,
You make a good point about the latency. We were aiming for maximum throughput above all else, and we also had a fixed upper limit on the number of clients.
Once we hit our throughput numbers, we did not really look at "jitter" across connections. I can say that we did not observe any connection failures and tcpdump + ethereal did not show us a detectable level of retransmits.
For high scalability, you would want to verify that each connection is being serviced often enough to avoid network stalls as well as connection errors. In practice, I suppose you could balance the "read until empty" approach with a timeout. So, for example, loop while the socket has data remaining or you've been servicing that connection for 100 ms. (Or whatever time period balances throughput vs. availability.) That's just thinking out loud, though. I haven't tried that approach in the real world.
Michael T. Nygard
michael@michaelnygard.com
http://www.michaelnygard.com/
Author of "Release It!"
http://pragmaticprogrammer.com/titles/mnee/index.html
This is of course the old trick to use
select()
(even before we couldpoll()
!).It is probably a good idea always to drop out of the loop after processing some fixed number of iterations: Suppose somebody is sending you a huge amount of data, as fast as you can process it (this can happen even if you are a simple proxy, if the sending side is faster than you are). You will never drop out of the loop -- you might process many GB of data, before being able to handle anything else. Suppose, on the contrary, you never perform more than 13 iterations of the loop on any client. The number of system calls required to process a large amount of data from a single client drops 13-fold, while ensuring you do not starve other clients just because one client is sending a lot of data quickly.
Comment may be a lot late for this topic.
There have been couple of frameworks around this framework
Apache MINA (Member of this)
Netty
Grizzly
They are all NIO frameworks, build around the concept