« Total Cost of Ownership for different web development frameworks | Main | Biggest Under Reported Story: Google's BigTable Costs 10 Times Less than Amazon's SimpleDB »
Saturday
May312008

memcached and Storage of Friend list

My first post, please be gentle. I know it is long. You are all like doctors - the more info, the better the diagnosis.

-----------
What is the best way to store a list of all of your friends in the memcached cache (a simple boolean saying “yes this user is your friend”, or “no”)? Think Robert Scoble (26,000+ “friends”) on Twitter.com. He views a list of ALL existing users, and in this list, his friends are highlighted.

I came up with 4 possible methods:
--store in memcache as an array, a list of all the "yes" friend ID's
--store your friend ID's as individual elements.
--store as a hash of arrays based on last 3 digits of friend's ID -- so have up to 1000 arrays for you.
--comma-delimited string of ID's as one element

I'm using the second one because I think it is faster to update. The single array or hash of arrays feels like too much overhead calculating and updating – and even just loading – to check for existence of a friend.

The key is FRIEND[small ID#]_[big ID#]. The value is 1.
This way there are no dupes. (I add u as friend, it always adds me as ur friend...I remove u, u remove me).
Store with it 2 additional flags: One denotes start of entries. One denotes end of entries.
As friends are added, the end flag position relative to new friends will become meaningless, but that is ok (I think).
To see if someone is your friend, the system checks if both start and end flags exist.
If both exist, it can check for existence of friend ID - if exists, then friend.
Start flag is required. If start flag is pushed out of cache, we must assume some friends were also pushed out.

Currently, the system loads from DB in a daemon in the background after you log in (if two flags are not already set).
Until the two flags are set, it does db lookups.
There is no timeout on the data in cache.
Adding/removing friends to your account adds/removes to/from memcache - so, theoretically, it might never have to pre-load anything.

Downside of my method is if the elements span multiple servers and one dies, you loose some of your friends (that's the upside of using arrays).
I don't know how to resolve if the lost box didn't contain either of the flags -- in that case, the users' info will NEVER get refreshed. This is my concern.

Any ideas?

Thanks so much!!!

Reader Comments (5)

Even at 26,000 users, assuming a user id is 4 bytes, the friend list is about 104K. Well within memcached limits. Linearly searching even an array of 26,000 items pinned in memory is pretty fast. Deserialization/serialization is far slower. For extra speed store binary blobs. Don't serialize into or out of objects. Wrap the data using the pimpl idiom (http://www.google.com/search?hl=en&q=pimpl&btnG=Google+Search) that operates on the binary data. But I wouldn't do this unless it was really necessary.

December 31, 1999 | Unregistered CommenterTodd Hoff

You should have no problem. It will be very fast :)
You can always add multiple memcached boxes and code the application to pull data from certain memcached box....

December 31, 1999 | Unregistered CommenterLinuxAdmin

If just need basic set operations, use a bloom filter?
Will reduce the size of the data required below the 104K for 26,000*4, at the expensive of some false positives.

December 31, 1999 | Unregistered CommenterAnonymous

Zip drives and zip softwares could also be used to easily lessen the storage size of data.
-----
http://underwaterseaplants.awardspace.com">sea plants
http://underwaterseaplants.awardspace.com/seagrapes.htm">Sea grapes...http://underwaterseaplants.awardspace.com/plantroots.htm">plant roots

December 31, 1999 | Unregistered Commenterfarhaj

I'll second the use of bloom filters for this application. A bloom filter is pretty much ideal for what you're describing, and it's fairly easy to use with memcached. I recently did something similar, with a *much* larger list, as a bloom filter. The filter was base64'd and stored in a memcached slab, and could easily be retrieved and XOR'd against other lists. Even in a language which reeeallllyyyy doesn't easily allow for bit arrays this was fast and easy.

December 31, 1999 | Unregistered Commenterquellish

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>