Entries in administration (10)

Monday
Feb172020

Important Health Checks for your MySQL Master-Slave Servers

In a MySQL master-slave high availability (HA) setup, it is important to continuously monitor the health of the master and slave servers so you can detect potential issues and take corrective actions. In this blog post, we explain some basic health checks you can do on your MySQL master and slave nodes to ensure your setup is healthy. The monitoring program or script must alert the high availability framework in case any of the health checks fails, enabling the high availability framework to take corrective actions in order to ensure service availability.

MySQL Master Server Health Checks

Click to read more ...

Wednesday
Jan082020

PostgreSQL Connection Pooling: Part 2 – PgBouncer

PostgreSQL Connection Pooling: Part 2 – PgBouncer

When it comes to connection pooling in the PostgreSQL world, PgBouncer is probably the most popular option. It’s a very simple utility that does exactly one thing – it sits between the database and the clients and speaks the PostgreSQL protocol, emulating a PostgreSQL server. A client connects to PgBouncer with the exact same syntax it would use when connecting directly to PostgreSQL – PgBouncer is essentially invisible.

Click to read more ...

Tuesday
Oct292019

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

AWS is the #1 cloud provider for open-source database hosting, and the go-to cloud for MySQL deployments. As organizations continue to migrate to the cloud, it’s important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. While many AWS users default to their managed database solution, Amazon RDS, there are alternatives available that can improve your MySQL performance on AWS through advanced customization options and unlimited EC2 instance type support. ScaleGrid offers a compelling alternative to hosting MySQL on AWS that offers better performance, more control, and no cloud vendor lock-in and the same price as Amazon RDS. In this post, we compare the performance of MySQL Amazon RDS vs. MySQL Hosting at ScaleGrid on AWS High Performance instances.

TLDR

Click to read more ...

Tuesday
Apr162019

MySQL High Availability Framework Explained – Part III: Failover Scenarios

MySQL High Availability Framework Explained – Part III: Failover Scenarios

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

MySQL Failover Scenarios

Scenario 1 – Master MySQL Goes Down

  • The Corosync and Pacemaker framework detects that the master MySQL is no longer available. Pacemaker demotes the master resource and tries to recover with a restart of the MySQL service, if possible.
  • At this point, due to the semisynchronous nature of the replication, all transactions committed on the master have been received by at least one of the slaves.
  • Pacemaker waits until all the received transactions are applied on the slaves and lets the slaves report their promotion scores. The score calculation is done in such a way that the score is ‘0’ if a slave is completely in sync with the master, and is a negative number otherwise.
  • Pacemaker picks the slave that has reported the 0 score and promotes that slave which now assumes the role of master MySQL on which writes are allowed.
  • After slave promotion, the Resource Agent triggers a DNS rerouting module. The module updates the proxy DNS entry with the IP address of the new master, thus, facilitating all application writes to be redirected to the new master.
  • Pacemaker also sets up the available slaves to start replicating from this new master.

Thus, whenever a master MySQL goes down (whether due to a MySQL crash, OS crash, system reboot, etc.), our HA framework detects it and promotes a suitable slave to take over the role of the master. This ensures that the system continues to be available to the applications.

Scenario 2 – Slave MySQL Goes Down

  • The Corosync and Pacemaker framework detects that the slave MySQL is no longer available.
  • Pacemaker tries to recover the resource by trying to restart MySQL on the node. If it comes up, it is added back to the current master as a slave and replication continues.
  • If recovery fails, Pacemaker reports that resource as down – based on which alerts or notifications can be generated. If necessary, the ScaleGrid support team will handle the recovery of this node.
  • In this case, there is no impact on the availability of MySQL services.

Scenario 3 – Network Partition – Network Connectivity Breaks Down Between Master and Slave Nodes

This is a classical problem in any distributed system where each node thinks the other nodes are down, while in reality, only the network communication between the nodes is broken. This scenario is more commonly known as split-brain scenario, and if not handled properly, can lead to more than one node claiming to be a master MySQL which in turn leads to data inconsistencies and corruption.

Let’s use an example to review how our framework deals with split-brain scenarios in the cluster. We assume that due to network issues, the cluster has partitioned into two groups – master in one group and 2 slaves in the other group, and we will denote this as [(M), (S1,S2)].

  • Corosync detects that the master node is not able to communicate with the slave nodes, and the slave nodes can communicate with each other, but not with the master.
  • The master node will not be able to commit any transactions as the semisynchronous replication expects acknowledgement from at least one of the slaves before the master can commit. At the same time, Pacemaker shuts down MySQL on the master node due to lack of quorum based on the Pacemaker setting ‘no-quorum-policy = stop’. Quorum here means a majority of the nodes, or two out of three in a 3-node cluster setup. Since there is only one master node running in this partition of the cluster, the no-quorum-policy setting is triggered leading to the shutdown of the MySQL master.
  • Now, Pacemaker on the partition [(S1), (S2)] detects that there is no master available in the cluster and initiates a promotion process. Assuming that S1 is up to date with the master (as guaranteed by semisynchronous replication), it is then promoted as the new master.
  • Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master.

Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

This concludes our 3-part blog series on the MySQL High Availability (HA) framework using semisynchronous replication and the Corosync plus Pacemaker stack. At ScaleGrid, we offer highly available hosting for MySQL on AWS and MySQL on Azure that is implemented based on the concepts explained in this blog series. Please visit the ScaleGrid Console for a free trial of our solutions.

Wednesday
Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Click to read more ...

Tuesday
Jan082019

Slow MySQL Start Time in GTID mode? Binary Log File Size May Be The Issue

Have you been experiencing slow MySQL startup times in GTID mode? We recently ran into this issue on one of our MySQL hosting deployments and set out to solve the problem. In this blog, we break down the issue that could be slowing down your MySQL restart times, how to debug for your deployment, and what you can do to decrease your start time and improve your understanding of GTID-based replication.

How We Found The Problem

Click to read more ...

Monday
Jun062011

NoSQL Pain? Learn How to Read/write Scale Without a Complete Re-write

Lately I've been reading more cases were different people have started to realize the limitations of the NoSQL promise to database scalability. Note the references below:

Take MongoDB for example. It's damn fast, but it doesn't really know how to save data reliably to disk. I've had it set up in a replica pair to mitigate that risk. Guess what - both servers in the pair failed and corrupted their data files at the same day.

It appears that for many, the switch to NoSQL can be rather painful. IMO that doesn't necessarily mean that NoSQL is wrong in general, but it's a combination of 1) lack of maturity 2) not the right tool for the job.

That brings the question of what's the alternative solution?

In the following post I tried to summarize the lessons from  Ronnie Bodinger (Head of IT at Avanza Bank AB) presentation on how they turned their current read-mostly scale architecture into a complete read/write scale without a complete re-writing of their existing application and while keeping the database as-is.

The lessons learned:

  • Minimize the change by clearly Identifying the scalability hotspots
  • Keep the database as is
  • Put an In Memory Data Grid as a front end to the database
  • Use write-behind to reduce the synchronization overhead
  • Use O/R mapping to map the data back into its original format
  • Use standard Java API and framework to leverage existing skillset
  • Use two parallel (old/new) sites to enable gradual transition
  • Use RAM for high performance access and disk for long term storage
  • Use commodity Database and HW

For a more detailed explanation read more here.

Monday
Jul122010

Creating Scalable Digital Libraries

Like many other media content providers, libraries and museums are increasingly moving their content onto the Web.  While the move itself is no easy process (with digitization, web development, and training costs), being able to successfully deliver content to a wide audience is an ongoing concern, particularly for large libraries.

Much of the concern is financial, as most libraries do not have the internal budget or outside investors that for-profit businesses enjoy.  Even large university libraries will face serious budget constraints that even other university departments, such as science and technology would not face.

Creating a scalable infrastructure and also distributing a large digital collection that can handle multiple requests, requires planning that many librarians have not even imagined.  They must stop thinking in terms of "one-item-per-customer" and start thinking in terms of numerous users accessing the same information simultaneously.

Click to read more ...

Tuesday
May272008

Secure Remote Administration for Large-Scale Networks

This website has been a great resource for helping me to understand the successful (and failed) scalable network designs from organizations that have actually done it, but I haven't seen any explicite explanations about secure remote administration of these systems. I understand that the *nix people love to SSH, and the windows gang has their RDP, but how does one go about creating a network architecture that both allows one to manage their systems and does its best to avoid hacker interest? As I imagine, no big website will have the SSH/RDP/FTP ports open on the web server, so how is it that they go about remotely administering their geographically diverse groups of servers securely?

Click to read more ...

Thursday
Feb072008

clusteradmin.blogspot.com - blog about building and administering clusters

A blog about cluster administration. Written by a System Administrator working at HPC (High Performance Computing) data-center, mostly dealing with PC clusters (100s of servers), SMP machines and distributed installations. The blog concentrates on software/configuration/installation management systems, load balancers, monitoring and other cluster-related solutions.

Click to read more ...