« Too many databases | Main | Speed up (Oracle) database code with result caching »
Tuesday
Jan292008

Building scalable storage into application - Instead of MogileFS OpenAFS etc.

I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far.

Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster.

So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think).

So, for our database table of uploaded files, here's how it currently looks (simplified):

fileid (pkey)
filename
ownerid

For adding the replication and scalability, I would add a few more columns:

serveroneid
servertwoid
serverthreeid
s3

At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cron job will run through the "files" table, and copy any files that haven't been replicated (where servertwo and serverthree are null) to other servers. Another cron will copy files to Amazon's s3 for an extra backup (if null then copy to s3).

Now at the client level, when the page to display the file is loaded, it will know which of the three servers it can pull the file from. If one server goes down, the application will know and use one of the other servers.

When storage capacity runs low, another server is added with a big drive, perhaps not even having raid on it. These servers will also be used for php serving through load balancing.

I'm probably missing some big drawbacks of this approach but it appeals to me that it should be quite simple to implement and be less complex to adminster than systems like MogileFS which would present a lot more unknowns.

Reader Comments (2)

Infrastructure is fun, but I might go with a well tested solution so I could concentrate on delivering the features that will make the site successful. Hide it all behind a service interface so you can replace it if needed.

Another option is Hadoop DFS (http://wiki.apache.org/hadoop/).

December 31, 1999 | Unregistered CommenterTodd Hoff

I would agree with Todd that you should go with something more tested and simpler.

Also it really depends on what you need the replication for?

From what I have read in your post, your solution does not have to do much with scalability.
Why would you be more scalable if you copy your files to 10 servers instead of one?

Making a logic like:
if file not on server 1 then try server 2
if file not on server 2 then try server 3
if file not on server 3 then try server amazons3
will make your site perhaps available at the cost of slowing it down.

If high availability is the goal, then you should like at using a high-available NAS (or homegrown NFS).
If you want to go complicoolated, then you may also look at GFS.

If you are trying something like "Get the file fastest from local disk first", then yes replication makes sense.
My rule: Always, always and always only trust a single source for your storage. (replication is fine but its disposable data)

If this is the case, then the hourly job does not make sense, you should implement something like copy on write to sync the file when its changed. You can do this with some automatic programs such as combination of some scripts with linux's inotify handler, or some other File alteration monitor.
This would only work with small number of files and directories, for larger data you will get intro trouble.
I wrote a tutorial few years back that tried to do something but failed when had to deal with thousands of files.
http://new.linuxfocus.org/English/March2001/article199.shtml

So insted of hourly job, you can run a continous script that monitors your files table and act when something is changed (using some triggers etc)

Not so long ago, I have tried something similar for caching our CMS content on only 4 servers and they were overwhelmed because the data was changing very quickly. Now with 40 servers I dont want to think about it.
If you dont that that much crazy data, then it might work for you and I will hapilly share the replicator with you.

Also to note: deleting the cache can also become quickly a problem
Nobody will care if they see image_xyz 0.1 seconds quicker, but your customers will KILL :-) you if they see an older image randomly after having updated an images (smiley face, angry face, smily face, angry face)

hope this helps.
best regards.

Atif

December 31, 1999 | Unregistered Commenteratif.ghaffar

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>