Sunday
Nov022008
Strategy: How to Manage Sessions Using Memcached

Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users.
The middle ground Dormando proposes is using both the cache and the database:
There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
Reader Comments (6)
How would this fit into a REST "strategy"? Wouldn't this violate a RESTafarian's viewpoint?
Forgive my rookie type questions but most of my experience has been with a J2EE environment where see people abuse the session has crashed more that one JVM/container over the years.
Thank you in advance for your thoughts.
From my experience, you usually end up with some type of "session" information for some special cases, ie shopping basket for shopping sites.
The difference is that they are exposed differently to the client, usually as a part of the URL rather than in a cookie.
Wether or not you store this in a database or a memcached instance is usually no different from using cookies on the server side.
Of cource, with the REST way of doing this, you will have to deal with issues like checking that the user/client is the one that owns the information, people not being able to guess other people's sessions, etc (just like you have to with cookies, btw).
I fully agree with the J2EE session issues, this is usually because people don't realize that they are putting things onto the session, and just continues to add to the complexity of maintaining it. I've fallen on a general rule of thumb that works quite well: Don't youse sessions unless you really, really have to, and keep as little information there as you can when you use it.
We just completed a similar component for storing PHP sessions in zookeeper. This is superior to the database solution for a couple of reasons:
a) zookeeper is truly HA right out of the box. Node failures, network partitions and similar nightmares have little effect on zookeeper. Restarting a failed node is as trivial as rebooting the machine or process. Restarted nodes automatically update themselves to the current running state.
b) zookeeper has comparable transaction rates to memcached but maintains persistent, reliable copies of the data. A 5 node zookeeper system has been reported to have >250,000 transactions per second at 20% write-through rate.
c) the code is vastly simpler. All told, our solution requires a few dozen lines of PHP and a hundred lines or so of Java. All updates go one place and there really is hardly any logic required at all, just transaction pass-throughs.
d) zookeeper is inherently backed up.
e) because I used thrift for the high-level interface, you can use this for a session store from Java, PHP, C++, Erlang, perl or just about any language you care to name.
f) the administration guide to zookeeper is about 2 pages long. Nobody knows how long the admin guide to mysql is any more.
Some downsides include:
a) I haven't contributed back our code yet.
b) You probably aren't running zookeeper now, you will have to start using it. Since you probably have a database and maybe even a memcached already, this is a slight downer. On the other hand, you probably *should* be running zookeeper.
Php's memcache library support hash algo's to sote sessions in x places, thus if you loose a node you have others. This allows you to upgrade, reboot, etc without logging people out. Stay away from Db's they really don't scale.
php.ini:
[memcache]
extension=memcache.so
session.save_handler = memcache
memcache.hash_strategy = "consistent"
memcache.max_failover_attempts = 100
memcache.allow_failover = 1
You failed to mention zookeeper runs on hadoop, which brings about a whole number of concerns. Yahoo sucessfully runs it, on 400 node clusters, but there is other speculation around suggesting that hadoop doesnt run so well on smaller < 10 node clusters.
I've tried memcache with PHP using session.save_handler = memcache;
Although it was extremely fast, there were too many misses where session data wasn't returned, causing users to be kicked from the app.
I would be curious to try:
memcache.hash_strategy = "consistent"
memcache.max_failover_attempts = 100
memcache.allow_failover = 1
and see if it solves the cache-miss problem.
Regardless, writing your own session handler with memcached backed by a DB sounds like a good solution.