« Product: GlusterFS | Main | Problem: Mobbing the Least Used Resource Error »
Saturday
Mar152008

New Website Design Considerations

I am in the design phase of getting a website up and running that will have scalability as a main concern. I am looking for opinions on architecture and the like for this endeavor.

The site has a few unique characteristics that make scalability difficult. Users will all have a pretty large amount of data that other users will be able to search. The site will be entirely based around search. The catch is that other users will be searching always with a stipulation of 'n' miles from me. I imagine that fact will kill the possibility of query caching for most searches.

I have extensive experience with PHP and MYSQL, some experience with ASP.NET/C#, some experience with perl but can learn anything fast. The site will start out on a single server but I want to be 100% certain that I architect the code and databases such that scaling will be simple.

What language should I code the site in? What DB would you use: Postgres, MYSQL, MSSQL, BerkelyDB? Should we shard the database by location? by user? not at all?

What does everyone think for possible architectures on this?

Reader Comments (5)

Sounds something like a dating site. You might want to take a look at PlentyOfFish (http://highscalability.com/plentyoffish-architecture). Don't worry about the language. Pick the one you think you can work done in and/or supports the libraries you need. PHP and MySQL are fine.

An extension to Lucene seems to support geographical searching (http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm). MySQL supports OpenGIS features, so at that point it might be about just applying the different sharding/partitioning strategies.

December 31, 1999 | Unregistered CommenterTodd Hoff

I wasnt aware of the Lucene extension. I'll have to give that a try.

I think my main concern is getting the database design in a workable, scalable state. Because of the 2-tiered search (first location then data) I was thinking about sharding by user/location and keeping a very simple, central database to decide which shard to hit. Basically, it would contain just UserIDofSomeSort, ShardId, Longitude, Latitude. This would allow me to hit that database with a quick query and then limit my other query by UserID on the shard. Sharding is all very new to me as are denormalized tables so does this look like a viable solution? Do you use standard replication to backup shards? Any other ideas/input?

I'd like to flesh this all out in detail before going further with my design due to the situation I currently took over which is a nightmare of a database issue that causes quite a bit of downtime. I would love to avoid these issues from the start.

Thanks in advance,
Mike

December 31, 1999 | Unregistered Commenteruprise78

A shard is backed up like any other database, so nothing special there. I wonder how much including the waypoint in the mapping table actually helps? Won't your shard be scaled to handle the load already? So just going to your user shard and performing the query should scale?

You may want to also look at Sharding the Hibernate Way (http://highscalability.com/sharding-hibernate-way).

December 31, 1999 | Unregistered CommenterTodd Hoff

You may be right about not including my waypoint in the mapping table. I was considering using it to limit the number of shards that need to be searched. Assuming 90% of searches will include location do you think it would be better to query all shards for each search or shard based on waypoint and leave the waypoint in the mapping table?

December 31, 1999 | Unregistered Commenteruprise78

I was just being stupid. You want to go from location to users in which case your organization seems right to me.

December 31, 1999 | Unregistered CommenterTodd Hoff

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>