« How Sites are Scaling Up for the Election Night Crush | Main | Olio Web2.0 Toolkit - Evaluate Web Technologies and Tools »
Sunday
Nov022008

Strategy: How to Manage Sessions Using Memcached

Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users.

The middle ground Dormando proposes is using both the cache and the database:

  • Reads: read from the cache first, then the database. Typical cache logic.
  • Writes: write to memcached every time, write to the database every N seconds (assuming the data has changed).

    There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.
  • Reader Comments (6)

    How would this fit into a REST "strategy"? Wouldn't this violate a RESTafarian's viewpoint?

    Forgive my rookie type questions but most of my experience has been with a J2EE environment where see people abuse the session has crashed more that one JVM/container over the years.

    Thank you in advance for your thoughts.

    December 31, 1999 | Unregistered CommenterDave Largo

    From my experience, you usually end up with some type of "session" information for some special cases, ie shopping basket for shopping sites.
    The difference is that they are exposed differently to the client, usually as a part of the URL rather than in a cookie.
    Wether or not you store this in a database or a memcached instance is usually no different from using cookies on the server side.

    Of cource, with the REST way of doing this, you will have to deal with issues like checking that the user/client is the one that owns the information, people not being able to guess other people's sessions, etc (just like you have to with cookies, btw).

    I fully agree with the J2EE session issues, this is usually because people don't realize that they are putting things onto the session, and just continues to add to the complexity of maintaining it. I've fallen on a general rule of thumb that works quite well: Don't youse sessions unless you really, really have to, and keep as little information there as you can when you use it.

    December 31, 1999 | Unregistered CommenterKyrre

    We just completed a similar component for storing PHP sessions in zookeeper. This is superior to the database solution for a couple of reasons:

    a) zookeeper is truly HA right out of the box. Node failures, network partitions and similar nightmares have little effect on zookeeper. Restarting a failed node is as trivial as rebooting the machine or process. Restarted nodes automatically update themselves to the current running state.

    b) zookeeper has comparable transaction rates to memcached but maintains persistent, reliable copies of the data. A 5 node zookeeper system has been reported to have >250,000 transactions per second at 20% write-through rate.

    c) the code is vastly simpler. All told, our solution requires a few dozen lines of PHP and a hundred lines or so of Java. All updates go one place and there really is hardly any logic required at all, just transaction pass-throughs.

    d) zookeeper is inherently backed up.

    e) because I used thrift for the high-level interface, you can use this for a session store from Java, PHP, C++, Erlang, perl or just about any language you care to name.

    f) the administration guide to zookeeper is about 2 pages long. Nobody knows how long the admin guide to mysql is any more.

    Some downsides include:

    a) I haven't contributed back our code yet.

    b) You probably aren't running zookeeper now, you will have to start using it. Since you probably have a database and maybe even a memcached already, this is a slight downer. On the other hand, you probably *should* be running zookeeper.

    December 31, 1999 | Unregistered CommenterTed Dunning

    Php's memcache library support hash algo's to sote sessions in x places, thus if you loose a node you have others. This allows you to upgrade, reboot, etc without logging people out. Stay away from Db's they really don't scale.

    php.ini:

    [memcache]
    extension=memcache.so
    session.save_handler = memcache
    memcache.hash_strategy = "consistent"
    memcache.max_failover_attempts = 100
    memcache.allow_failover = 1

    December 31, 1999 | Unregistered CommenterAnonymous

    You failed to mention zookeeper runs on hadoop, which brings about a whole number of concerns. Yahoo sucessfully runs it, on 400 node clusters, but there is other speculation around suggesting that hadoop doesnt run so well on smaller < 10 node clusters.

    December 31, 1999 | Unregistered CommenterAnonymous

    I've tried memcache with PHP using session.save_handler = memcache;

    Although it was extremely fast, there were too many misses where session data wasn't returned, causing users to be kicked from the app.

    I would be curious to try:

    memcache.hash_strategy = "consistent"
    memcache.max_failover_attempts = 100
    memcache.allow_failover = 1

    and see if it solves the cache-miss problem.

    Regardless, writing your own session handler with memcached backed by a DB sounds like a good solution.

    December 31, 1999 | Unregistered CommenterChris

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>