High Scalability -

Kristi Anderson |

Permalink |

PostgreSQL Automatic Failover,

tagged

DBA,

Patroni,

python,

sql in

Appliction monitoring,

Clustering,

DevOps,

Failure Analysis,

Geo-distributed Clusters,

Python,

cloud,

database replication,

deployment,

enterprise architecture,

sql,

tutorial

Tuesday

Apr162019

MySQL High Availability Framework Explained – Part III: Failover Scenarios

Tuesday, April 16, 2019 at 9:33AM

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

MySQL Failover Scenarios

Scenario 1 – Master MySQL Goes Down

The Corosync and Pacemaker framework detects that the master MySQL is no longer available. Pacemaker demotes the master resource and tries to recover with a restart of the MySQL service, if possible.
At this point, due to the semisynchronous nature of the replication, all transactions committed on the master have been received by at least one of the slaves.
Pacemaker waits until all the received transactions are applied on the slaves and lets the slaves report their promotion scores. The score calculation is done in such a way that the score is ‘0’ if a slave is completely in sync with the master, and is a negative number otherwise.
Pacemaker picks the slave that has reported the 0 score and promotes that slave which now assumes the role of master MySQL on which writes are allowed.
After slave promotion, the Resource Agent triggers a DNS rerouting module. The module updates the proxy DNS entry with the IP address of the new master, thus, facilitating all application writes to be redirected to the new master.
Pacemaker also sets up the available slaves to start replicating from this new master.

Thus, whenever a master MySQL goes down (whether due to a MySQL crash, OS crash, system reboot, etc.), our HA framework detects it and promotes a suitable slave to take over the role of the master. This ensures that the system continues to be available to the applications.

Scenario 2 – Slave MySQL Goes Down

The Corosync and Pacemaker framework detects that the slave MySQL is no longer available.
Pacemaker tries to recover the resource by trying to restart MySQL on the node. If it comes up, it is added back to the current master as a slave and replication continues.
If recovery fails, Pacemaker reports that resource as down – based on which alerts or notifications can be generated. If necessary, the ScaleGrid support team will handle the recovery of this node.
In this case, there is no impact on the availability of MySQL services.

Scenario 3 – Network Partition – Network Connectivity Breaks Down Between Master and Slave Nodes

This is a classical problem in any distributed system where each node thinks the other nodes are down, while in reality, only the network communication between the nodes is broken. This scenario is more commonly known as split-brain scenario, and if not handled properly, can lead to more than one node claiming to be a master MySQL which in turn leads to data inconsistencies and corruption.

Let’s use an example to review how our framework deals with split-brain scenarios in the cluster. We assume that due to network issues, the cluster has partitioned into two groups – master in one group and 2 slaves in the other group, and we will denote this as [(M), (S1,S2)].

Corosync detects that the master node is not able to communicate with the slave nodes, and the slave nodes can communicate with each other, but not with the master.
The master node will not be able to commit any transactions as the semisynchronous replication expects acknowledgement from at least one of the slaves before the master can commit. At the same time, Pacemaker shuts down MySQL on the master node due to lack of quorum based on the Pacemaker setting ‘no-quorum-policy = stop’. Quorum here means a majority of the nodes, or two out of three in a 3-node cluster setup. Since there is only one master node running in this partition of the cluster, the no-quorum-policy setting is triggered leading to the shutdown of the MySQL master.
Now, Pacemaker on the partition [(S1), (S2)] detects that there is no master available in the cluster and initiates a promotion process. Assuming that S1 is up to date with the master (as guaranteed by semisynchronous replication), it is then promoted as the new master.
Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master.

Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

This concludes our 3-part blog series on the MySQL High Availability (HA) framework using semisynchronous replication and the Corosync plus Pacemaker stack. At ScaleGrid, we offer highly available hosting for MySQL on AWS and MySQL on Azure that is implemented based on the concepts explained in this blog series. Please visit the ScaleGrid Console for a free trial of our solutions.

Kristi Anderson |

Permalink |

High Availability Framework,

tagged

Master Slave,

MySQL Crash,

MySQL Failover Scenarios,

MySQL High Availability,

MySQL Semisynchronous Replication,

MysQL Hosting,

Network Connectivity,

Replication,

Split-Brain,

database,

sql in

AWS,

Database,

DevOps,

Failure Analysis,

Geo-distributed Clusters,

MySQL,

cloud,

database replication,

database scalability,

enterprise architecture,

nodes,

sql,

uptime

Wednesday

Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

Wednesday, July 9, 2014 at 9:00AM

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

database performance online website,

Nati Shalom |

1 Comment |

Permalink |

SSD,

database replication,

database scalability,

distribucted caching,

enterprise architecture,

space based architecture,

space based programming,

space-based architecture,

sql,

storage delivery

Wednesday

Feb252009

Enterprise Architecture Conference by - John Zachman. Johannesburg (25th March) , Cape Town (27Th March) Dubai (23rd March)

Wednesday, February 25, 2009 at 2:36PM

Why You Need To Attend THIS CONFERENCE • Understand the multi-dimensional view of business-technology alignment • A sense of urgency for aggressively pursuing Enterprise Architecture • A "language" (ie., a Framework) for improving enterprise communications about architecture issues • An understanding of the cultural changes implied by process evolution. How to effectively use the framework to anchor processes and procedures for delivering service and support for applications • An understanding of basic Enterprise physics • Recommendations for the Sr. Managers to understand the political realities and organizational resistance in realizing EA vision and some excellent advices for overcoming these barriers • Number of practical examples of how to work with people who affect decisions on EA implementation • How to create value for your organization by systematically recording assets, processes, connectivity, people, timing and motivation, through a simple framework For registrations, group discounts or further details please contact Caroline.smith@icmgworld.com http://www.ITArchitectureSummit.com

carolines5 |

Permalink |

General Discussion,

enterprise architecture

Wednesday

Feb252009

Learn how to manage change and complexity by Zachman Live.

Wednesday, February 25, 2009 at 2:33PM

John Zachman (Father of enterprise architecture) Given this renascent interest, who better to explain the principles behind Enterprise Architecture than the man himself, John Zachman, the originator of the " Zachman Framework for Enterprise Architecture" Join this workshop in Johannesburg 25th Mar 09 and Cape town in 27th March 09 and Mr.Zachman will explain how and why Enterprise Architecture provides measure, such an implementation is a daunting task with opportunities to fail lurking in many places. For more details visit http://www.ITArchitectureSummit.com For registrations, group discounts or further details please contact Caroline.smith@icmgworld.com

carolines5 |

Permalink |