« Google: Introduction to Distributed System Design | Main | Mailinator Architecture »
Friday
Jan252008

Application Database and DAL Architecture

Hi gurus, I'm totally new to this high scalability thing. I'm trying to create a website with scalability in mind (personal project). In my application I'll have forums for different groups of people (each group will have their own forums, members of groups can still post in other groups' forums but each group will mainly be using their forums most of the time). Now, I'm going to start with about 2000 groups with the potential of reaching up to 10000 groups (this is the maximum due to the nature of my application). I was thinking that having all posts in one table will be way too much for one table (esp. that some groups are expected to post hundreds or even thousands times per day, let's say about 500 of the groups, the rest of the groups won't be that active though) as I'll have to index the PostID, ParentPostID, GroupID and PostDate which can produce large indexes (consequentially causing slow inserts) if having everything in one table. So, I'm thinking of a way to divide the posts in many tables, here are some of the things I thought of:

1. Creating a separate table for every group e.g. ForumsPosts_x, where x is the GroupID (which has its own pros and cons, some of the pros that I can have small indexes and also use identity columns, I also assume it should be easy to move the tables to other databases should the application grow. Well, I posted this idea on some other forums and most people told me it's a sign of bad design if I have thousands of tables in my database. I was also concerned how to design my DAL if I do this. Should I use sprocs with dynamic SQL or use SQL text directly in my DAL code and what about the query plan caching if having a large number of tables .. so many problems here!)

2. Put everything in one table and if the site grows move some of the groups to another database (I'm concerned though about having many databases on the same machine, will it affect performance? of course I won't have hundreds of databases on the same machine but may be about 5 or even 10 databases on the same machine)

I also have some other questions:

I'm going to use ASP.NET for this project, I was planning initially to use SQL Server as a database but I'm worried about the SQL Server part and the cost of growth, should I consider an alternative like MySQL? But how will it perform with ASP.NET though in a high scalability scenario?

Any suggestions are highly appreciated...

Reader Comments (6)

That's a lot of groups. You might want to take a look at the http://highscalability.com/livejournal-architecture">LiveJournal and http://highscalability.com/mixi-jp-architecture">Mixi architecture for ideas. The downside to SQL server is the licenses get expensive as you expand. There are adapters so you can use MySQL from ASP.NET. That might be a more reasonable growth path for the budget conscious.

December 31, 1999 | Unregistered CommenterTodd Hoff

Thanks a lot for your reply, I'm still really confused whether to use one table or create separate tables for every group, I'm finally thinking about using one table for all the groups with the possibility of having a separate table for a group in case it grows too large.

Regarding MySQL, actually I know it can be used from ASP.NET (have used it in a couple of projects) but I was worried about the performance in a high scalability application as it's not really common to find MySQL used with ASP.NET, so I don't have much information about this.

December 31, 1999 | Unregistered CommenterWaleed Eissa

Waleed,

You might want to do both.
Keep a master db where you put the master data. All nicely in related tables etc,
Then create per forum tables (de-normalized) for reading, searching etc.

Keep the master db for writting only and the reading db that is recreatable with a script.
We use this technique for our spam quarantine database that recieved around 3-5 million records per day.
This database is then split into databases per domain when a request comes in to look at the quarantine.

The total 10 day db is anywhere between 30 - 50 million rows and it would be almost impossible to use it for reading and writting.

We do similar also for our log-processing databases.

Lets say you have a very simple table called forums and you want to split them to forum_1, forum_2, etc.
You can do something like this.

1. create table if not exists forum_1 like forum; -- This will create the table forum_1 as the replica of forum if it does not exists.
2. Get the highest if from the forum_1, lets say its 12345;
3. Re-populate the table.
insert into forum_1 select * from forum where id>12345 and forum_id=1;
This statement will in a couple of seconds depending on the data populate forum_1 table.

Note: The first time, it may take very long time but every other execution will take less time.

This way you can also mix and match storage engines, for example the main db uses innodb while the target db uses myisam, etc.

Hope this helps.

best regards

December 31, 1999 | Unregistered Commenteratif.ghaffar

Hey man,

MS-SQL becomes a problem when it comes to licensing costs. MySQL is a great alternative, as well as PostgreSQL would do the job.
You could focus in making your code somewhat cross-platform, so you will be able to host your asp.net site on linux in a near future. By saying "cross-platform" I mean: do not use Operational System specifics paths. Like "c:\webroot".... Instead, use asp.net specific commands to grap the paths.

MONO's ASP.NET implementation is giving a huge value to PostgreSQL. So I think the "defaults" on asp.net mono databases will be PostgreSQL.

December 31, 1999 | Unregistered CommenterAnonymous

The above was posted by Marcelo Szwarc... it's me... :-)

December 31, 1999 | Unregistered CommenterMarcelo Szwarc

Thanks a lot for all your replies guys ...

December 31, 1999 | Unregistered CommenterWaleed Eissa

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>