Entries in General Discussion (161)

Tuesday
Jan292008

Too many databases

Hi, I am using drupal for my clients website, and was thinking is it possible to host all ( about 500) of them on the same server(maybe VPS or dedicated). Here is the situation..... Each clients website has a database with about 50 tables each, all the databases are small in size about 2-5 MB .... and the websites are low traffic websites with say.. 50 hits/day on avg.... that means about 2000 queries/db/day ..... (avg 40 queries per hit).... Wanted to know if it is possible to have so many databases about 500 on the same server? what are the things that i should look into if i should make this happen?

Click to read more ...

Tuesday
Jan292008

Building scalable storage into application - Instead of MogileFS OpenAFS etc.

I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cron job will run through the "files" table, and copy any files that haven't been replicated (where servertwo and serverthree are null) to other servers. Another cron will copy files to Amazon's s3 for an extra backup (if null then copy to s3). Now at the client level, when the page to display the file is loaded, it will know which of the three servers it can pull the file from. If one server goes down, the application will know and use one of the other servers. When storage capacity runs low, another server is added with a big drive, perhaps not even having raid on it. These servers will also be used for php serving through load balancing. I'm probably missing some big drawbacks of this approach but it appeals to me that it should be quite simple to implement and be less complex to adminster than systems like MogileFS which would present a lot more unknowns.

Click to read more ...

Tuesday
Jan292008

When things aren't scalable

OK, I know this site is for scalable web site design. But as there aren't any sites I can find for graceful failure under "slashdotted" like pressure I'll ask here. Does anyone have a sensible way, once you have a "web application" that either won't scale, or can't scale, that you can give some users a good consistent experience and bounce other users to a busy site page. I have seen sites do this to varying degrees, some of which work better than others, but no explanations beyond simply bouncing requests to a "we're busy page server" when you have more than a given number of connections. This is obviously useless as a web page likely requires multiple connection (ignoring keep-alive, pipelining etc) multiple connection to completely render properly. The normal problem is users getting a page and not the "furniture" for that page like images or css. Other problems are having to wait ages to get the busy page or the site being slow even if you do "get in". And some site let a user "in" and then as they browse around they get bounced out suddenly to the busy page. Obviously not being the developer for sites I deal with (I am an infrastructure bod) I can't solve the problem where it should have been pre-emptively solved. That is to say I can't write the code to be scalable or re-write the code to do some simple session filtering or the like (and not being a developer I get dirty looks when I point developers at information like your site ... I can hear them thinking "how dare you suggest I don't know how to code a web site you lowly infrastructure cretin"). Before developer on-line lynch me I should point out that sometimes the cause of not being able to scale a site is that I can't get in new hardware quick enough, but then who knows when you will get slashdotted right ?. So my question applies even when a developer of genius level brilliance has built a unsurpasibly scalable web site for me to run the infrastructure for. My best guess so far is using something like HAProxy to load balance sessions, and then use it's more advanced total session count, and cookie issuing abilities to track users and bounce some at a given "heavy load" point. This isn't ideal as the heavy load point would have to be based on connection counts not server load or server response times, but it's the best I can come up with so far. Also, having mentioned brilliant developers writing great sites not always making my question redundant, could I ask, do people normally think about coping with overload when designing scalable solution - surely they should but I don't see much talk about it. Couldn't a simple Java filter or the equivalent for other things be built into applications ? It'd be nice to have a site that not only scales, but "is nice" when waiting for the infrastructure it runs on to be scaled, which could be several days when you have to purchase new hardware.

Click to read more ...

Monday
Jan282008

DR/BC for web/DB servers

All, I'm looking for a faster/reliable solution for DR/BC as well as for sclability for my web/db servers. I came across VMWare Infrastructure and other products. The I/O performance concerns me to go with virtual servers. I'm also looking into imaging software such as Acrnois. Could anyone share their thoughts on how it's being done with bigger names such as google/youtube etc..? Thank you, Regards, Janakan Rajendran.

Click to read more ...

Sunday
Jan272008

Windows and SQL Server : Receive so much negativity in terms of the Highly Available, Scalable Platform..

I remain neutral, but time and again, when people talk Windows or SQL Server, they seem to consider them unreliable with limits around scalability, performance and availability. And then you start looking at some of the big boys you have listed here in the architectural section and most of them are on Linux, MySQL,Oracle platforms that we dont see Windows and SQL Server in there.. What are your thoughts ?

Click to read more ...

Sunday
Jan272008

Scalability vs Performance vs Availability vs Reliability.. Also scale up vs scale out ???

Where do you draw the line between scalability vs Performance vs High Availability vs Reliability? I guess at the end of the day, we all want to be highly available, great performance and always reliable. So is it safe to say that scalability is the answer ? Also when do you start to think scale out vs scale up ?

Click to read more ...

Friday
Jan252008

Application Database and DAL Architecture

Hi gurus, I'm totally new to this high scalability thing. I'm trying to create a website with scalability in mind (personal project). In my application I'll have forums for different groups of people (each group will have their own forums, members of groups can still post in other groups' forums but each group will mainly be using their forums most of the time). Now, I'm going to start with about 2000 groups with the potential of reaching up to 10000 groups (this is the maximum due to the nature of my application). I was thinking that having all posts in one table will be way too much for one table (esp. that some groups are expected to post hundreds or even thousands times per day, let's say about 500 of the groups, the rest of the groups won't be that active though) as I'll have to index the PostID, ParentPostID, GroupID and PostDate which can produce large indexes (consequentially causing slow inserts) if having everything in one table. So, I'm thinking of a way to divide the posts in many tables, here are some of the things I thought of: 1. Creating a separate table for every group e.g. ForumsPosts_x, where x is the GroupID (which has its own pros and cons, some of the pros that I can have small indexes and also use identity columns, I also assume it should be easy to move the tables to other databases should the application grow. Well, I posted this idea on some other forums and most people told me it's a sign of bad design if I have thousands of tables in my database. I was also concerned how to design my DAL if I do this. Should I use sprocs with dynamic SQL or use SQL text directly in my DAL code and what about the query plan caching if having a large number of tables .. so many problems here!) 2. Put everything in one table and if the site grows move some of the groups to another database (I'm concerned though about having many databases on the same machine, will it affect performance? of course I won't have hundreds of databases on the same machine but may be about 5 or even 10 databases on the same machine) I also have some other questions: I'm going to use ASP.NET for this project, I was planning initially to use SQL Server as a database but I'm worried about the SQL Server part and the cost of growth, should I consider an alternative like MySQL? But how will it perform with ASP.NET though in a high scalability scenario? Any suggestions are highly appreciated...

Click to read more ...

Tuesday
Jan222008

The high scalability community

Hi, First of all; thanks for a creating a GREAT resource on high scalability architecture. For us building high scalability solutions from the west coast of (tiny) Norway good input on the subject isn't always abundant. Which leads me to my next question; Are there any events or conferences on high scalability / SaaS in the US or internationally that any of you would recommend architects or data center managers to attend?

Click to read more ...

Thursday
Jan172008

Load Balancing of web server traffic

How to detect Congestion occurence in the network? Parameter of Load Balancer?

Click to read more ...

Tuesday
Jan152008

Sun to Acquire MySQL

So what are we announcing today? That in addition to acquiring MySQL, Sun will be unveiling new global support offerings into the MySQL marketplace. We'll be investing in both the community, and the marketplace - to accelerate the industry's phase change away from proprietary technology to the new world of open web platforms. Read more on Jonathan Schwartz's Blog What do you think about this?

Click to read more ...