Thursday
Aug092007
Lots of questions for high scalability / high availability

Hey,
I do have a website that I would like to scale. Right now we have 10 servers but this does not scale well.
I know how to deal with my apache web servers but have problems with sql servers.
I would like to use the "scale out" system and add servers when we need. We have over 100Gb of data for mysql and we tried to have around 20G per server. It works well except that if a server goes down then 1/5 of the user can't access the website. We could use replication but we would need to at least double sql servers to replicate each server. And maybe in the future it's not gonna be enough we would need maybe 3 slaves per master ... well I don't really like this idea.
I would prefer to have 8 servers that all deal with data from the 5 servers we have right now and then we could add new servers when we need. I looked at NFS but that does not seem to be a good idea for SQL servers ? Can you confirm?
I do have a website that I would like to scale. Right now we have 10 servers but this does not scale well.
I know how to deal with my apache web servers but have problems with sql servers.
I would like to use the "scale out" system and add servers when we need. We have over 100Gb of data for mysql and we tried to have around 20G per server. It works well except that if a server goes down then 1/5 of the user can't access the website. We could use replication but we would need to at least double sql servers to replicate each server. And maybe in the future it's not gonna be enough we would need maybe 3 slaves per master ... well I don't really like this idea.
I would prefer to have 8 servers that all deal with data from the 5 servers we have right now and then we could add new servers when we need. I looked at NFS but that does not seem to be a good idea for SQL servers ? Can you confirm?
Reader Comments (4)
There are so many different strategies. Can you characterize your traffic? What percent is reads? What percent is writes? Are you storing large media files or small objects? How many qps?
Some possible options:
Perhaps you can setup caching servers to remove write traffic from your existing servers and then reallocated your existing servers in 5 shards, in groups of two.
Perhaps you can split your read from your write traffic and denormalize the read database?
Hey
thanks for your reply. I do not have large files, just a dynamic website.
For mysql, we have 70% read and 30% insert, update & delete.
I think we can increase number of read and reduce number of write by using more cache and writing less often.
My main concern about splitting servers into read & write is that it's not really scalable for the write servers. If
we use one server for write, what happen when this server reaches his limit?
I would prefer a "cluster" solution where we can add servers when we need new servers but I don't really know how to set it up and what would be the best solution?
Do you have ideas?
thanks
There are 2 ways of clustering MySQL:
1. Use MySQL clustering.
2. Use RedHat Clustering Services (RHCS)
Without knowing what your current OS is, I'll assume you use Redhat.
MySQL clustering can be complex, as it uses a 3 tier approach, which are referred to as SQL nodes, Data nodes and the management server. In an ideal world, for full redundancy and high performance, you would need a minimum of 2 SQL nodes, 2 Data nodes, and 2 management servers. Once you have passed the pain of setting it all up, you can quite easily add additional Data nodes or SQL nodes to scale out. Although this might sound like the best solution, it requires considerable knowledge in MySQL, and some serious skills. I would also highly recommend putting in some sort of monitoring system for each node.
Another easier alternative, is to use Redhat Clustering Services (RHCS). This service effectively has the capabilities of clustering any other services. So to configure it, you would need to run 2 nodes with the MySQL daemon, and use RHCS to cluster those services. This does require some sort of shared storage like a DAS, SAN or NAS. This also works best in an active/passive setup, so one will see no performance gains.
BUT
If you use the sharding effect, IE seperating the read parts of the database from the write, you will see significant performance gains from the database.
So in your case, I would recommend the following:
1. Setup an active/passive cluster for the write part of your database using RHCS, providing redundancy. This will require 2 servers, and a shared storage.
2. Shard the read parts of the database over 3 or more nodes, that can be loadbalanced, using a hardware load balancer. The load balancer will provide redundancy via health checks on the nodes which are part of its balancing pool.
Things to rememeber:
1. I do realise that you will say, " but what happens when my write node hits capacity". This is unfortunately a drawback with most database applications. When resources run out due to a high volume of writes to the database, this can potentially be terminal. One needs to start getting artistic in your database architecture and code. Also you can use a number of utilities like indexing, cache and table structure changes to improve performance. After that you have further options, like adding more resources like memory and processing power. I always believe one should attempt to fix the bottelneck before adding additional resources, because if the original issue is not fixed, and adding resources is performed, this could compound the problem further down the line, which is bad.
2. You will need to replicate the changes made to the write node to the read nodes. But this is asynchronous replication, and can be managed by something like rsynch, which is easy to use and configure.
Also I hate to say this, but if this is not a route you want to follow, you should look at database technologies that provide true distributed processing, like Oracle. The limitation of MySQL, is that when you split it up over a number of nodes, each instance is addressed as it's own. Oracle has the capability of have a number of nodes sharing an instance, which makes it very easy to scale, as the application never needs to change.
NFS, not a good idea if you have high traffic volumes, it can be cumbersome, and be a potential bottleneck.
> There are 2 ways of clustering MySQL
Thanks chadrodinis for your excellent reply. Isn't clustering more for HA than performance though? MySQL doesn't parallelize IO access, so there's not much of a win.
The High Availability MySQL site talks about this a bit in their http://mysqlha.blogspot.com/2007/06/top-5-wishes-for-mysql.html">Top 5 wishes for MySQL article.