Entries in high availability (13)

Monday
Feb172020

Important Health Checks for your MySQL Master-Slave Servers

In a MySQL master-slave high availability (HA) setup, it is important to continuously monitor the health of the master and slave servers so you can detect potential issues and take corrective actions. In this blog post, we explain some basic health checks you can do on your MySQL master and slave nodes to ensure your setup is healthy. The monitoring program or script must alert the high availability framework in case any of the health checks fails, enabling the high availability framework to take corrective actions in order to ensure service availability.

MySQL Master Server Health Checks

Click to read more ...

Tuesday
Oct292019

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

AWS is the #1 cloud provider for open-source database hosting, and the go-to cloud for MySQL deployments. As organizations continue to migrate to the cloud, it’s important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. While many AWS users default to their managed database solution, Amazon RDS, there are alternatives available that can improve your MySQL performance on AWS through advanced customization options and unlimited EC2 instance type support. ScaleGrid offers a compelling alternative to hosting MySQL on AWS that offers better performance, more control, and no cloud vendor lock-in and the same price as Amazon RDS. In this post, we compare the performance of MySQL Amazon RDS vs. MySQL Hosting at ScaleGrid on AWS High Performance instances.

TLDR

Click to read more ...

Thursday
Oct032019

Redis Cloud Gets Easier with Fully Managed Hosting on Azure

Redis Cloud Gets Easier with Fully Managed Hosting on Azure

ScaleGrid, a rapidly growing leader in the Database-as-a-Service (DBaaS) space, has just launched their new fully managed Redis on Azure service. This Redis management solution allows startups up to enterprise-level organizations automate their Redis operations on Microsoft Azure dedicated cloud servers, alongside their other open source database deployments, including MongoDBMySQL and PostgreSQL.

Redis, the #1 key-value store and top 10 database in the world, has grown by over 300% in popularity over that past 5 years, per the DB-Engines knowledge base. The demand for Redis is skyrocketing across dozens of use cases, particularly for cache, queues, geospatial data, and high speed transactions. This simple database management system makes it very easy to store and retrieve pairs of keys and values, and is commonly paired with other database types to increase the speed and performance of an application. According to the 2019 Open Source Database Report, a majority of Redis deployments are used in conjunction with MySQL, and over half of Redis deployments are used with either PostgreSQL, MongoDB, and Elasticsearch.

ScaleGrid’s Redis hosting service allows these organizations to automate all of their time-consuming management tasks, such as backups, upgrades, scaling, replication, sharding, monitoring, alerts, log rotations, and OS patching, so their DBAs, developers, and DevOps teams can focus on new product development and optimizing performance. Additionally, organizations can customize their Redis persistence and host through their own Azure account which allows them to leverage advanced cloud capabilities like Azure Virtual Networks (VNET), Security Groups, and Reserved Instances to reduce long-term hosting costs up to 60%. 

“Cloud reliability has never been so important,” says Dharshan Rangegowda, Founder and CEO of ScaleGrid. “It’s crucial for organizations to properly configure their Redis deployments for high availability and disaster recovery, as a couple minutes of downtime can be detrimental to a company’s security and reputation.”

ScaleGrid is the only Redis cloud service that allows you to customize your master-slave and cross-datacenter configurations for 100% uptime and availability across 30 different Azure regions. They also allow you to keep full Redis admin access and SSH access to your machines, and you can learn more about their advantages over competitors Compose for Redis, RedisGreen, Redis Labs and Elasticache for Redis on their Compare Redis Providers page.

Monday
Sep162019

Managing High Availability in PostgreSQL – Part III: Patroni

Managing High Availability in PostgreSQL – Part III: Patroni - ScaleGrid Blog

In our previous blog posts, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by Cluster Labs and Replication Manager (repmgr) by 2ndQuadrant. In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment.

Patroni for PostgreSQL

Click to read more ...

Tuesday
Apr162019

MySQL High Availability Framework Explained – Part III: Failover Scenarios

MySQL High Availability Framework Explained – Part III: Failover Scenarios

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

MySQL Failover Scenarios

Scenario 1 – Master MySQL Goes Down

  • The Corosync and Pacemaker framework detects that the master MySQL is no longer available. Pacemaker demotes the master resource and tries to recover with a restart of the MySQL service, if possible.
  • At this point, due to the semisynchronous nature of the replication, all transactions committed on the master have been received by at least one of the slaves.
  • Pacemaker waits until all the received transactions are applied on the slaves and lets the slaves report their promotion scores. The score calculation is done in such a way that the score is ‘0’ if a slave is completely in sync with the master, and is a negative number otherwise.
  • Pacemaker picks the slave that has reported the 0 score and promotes that slave which now assumes the role of master MySQL on which writes are allowed.
  • After slave promotion, the Resource Agent triggers a DNS rerouting module. The module updates the proxy DNS entry with the IP address of the new master, thus, facilitating all application writes to be redirected to the new master.
  • Pacemaker also sets up the available slaves to start replicating from this new master.

Thus, whenever a master MySQL goes down (whether due to a MySQL crash, OS crash, system reboot, etc.), our HA framework detects it and promotes a suitable slave to take over the role of the master. This ensures that the system continues to be available to the applications.

Scenario 2 – Slave MySQL Goes Down

  • The Corosync and Pacemaker framework detects that the slave MySQL is no longer available.
  • Pacemaker tries to recover the resource by trying to restart MySQL on the node. If it comes up, it is added back to the current master as a slave and replication continues.
  • If recovery fails, Pacemaker reports that resource as down – based on which alerts or notifications can be generated. If necessary, the ScaleGrid support team will handle the recovery of this node.
  • In this case, there is no impact on the availability of MySQL services.

Scenario 3 – Network Partition – Network Connectivity Breaks Down Between Master and Slave Nodes

This is a classical problem in any distributed system where each node thinks the other nodes are down, while in reality, only the network communication between the nodes is broken. This scenario is more commonly known as split-brain scenario, and if not handled properly, can lead to more than one node claiming to be a master MySQL which in turn leads to data inconsistencies and corruption.

Let’s use an example to review how our framework deals with split-brain scenarios in the cluster. We assume that due to network issues, the cluster has partitioned into two groups – master in one group and 2 slaves in the other group, and we will denote this as [(M), (S1,S2)].

  • Corosync detects that the master node is not able to communicate with the slave nodes, and the slave nodes can communicate with each other, but not with the master.
  • The master node will not be able to commit any transactions as the semisynchronous replication expects acknowledgement from at least one of the slaves before the master can commit. At the same time, Pacemaker shuts down MySQL on the master node due to lack of quorum based on the Pacemaker setting ‘no-quorum-policy = stop’. Quorum here means a majority of the nodes, or two out of three in a 3-node cluster setup. Since there is only one master node running in this partition of the cluster, the no-quorum-policy setting is triggered leading to the shutdown of the MySQL master.
  • Now, Pacemaker on the partition [(S1), (S2)] detects that there is no master available in the cluster and initiates a promotion process. Assuming that S1 is up to date with the master (as guaranteed by semisynchronous replication), it is then promoted as the new master.
  • Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master.

Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

This concludes our 3-part blog series on the MySQL High Availability (HA) framework using semisynchronous replication and the Corosync plus Pacemaker stack. At ScaleGrid, we offer highly available hosting for MySQL on AWS and MySQL on Azure that is implemented based on the concepts explained in this blog series. Please visit the ScaleGrid Console for a free trial of our solutions.

Wednesday
Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Click to read more ...

Tuesday
Feb192019

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

Click to read more ...

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Monday
Apr082013

NuoDB's First Experience: Google Compute Engine - 1.8 Million Transactions Per Second

This is a repost of the blog entry written by NuoDB's Tommy Reilly.  

We at NuoDB were recently given the opportunity to kick the tires on the Google Compute Engine by our friends over at Google. You can watch the entire Google Developer Live Session by clicking here.  In order to access the capabilities of GCE we decided to run the same YCSB based benchmark we ran at our General Availability Launch back in January. For those of you who missed it we demonstrated running the YCSB benchmark on a 24 machine cluster running on our private cloud in the NuoDB datacenter. The salient results were 1.7 million transactions per second with sub-millisecond latencies...

Click to read more ...

Sunday
Aug172008

Wuala - P2P Online Storage Cloud

How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network. The goal of Wua.la is to provide distributed online storage that is:

  • large
  • scalable
  • reliable
  • secure
by harnessing the idle resources of participating computers. This challenge is an old dream of computer science. In fact as Andrew Tanenbaum wrote in 1995: "The design of a world-wide, fully transparent distributed filesystem fot simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader" After three years of research and development at at ETH Zurich, the Swiss Federal Institute of Technology on a distributed storage system, Caleido is ready to unveil the result: Wuala. Wuala is a new way of storing, sharing, and publishing files on the internet. It enables its users to trade parts of their local storage for online storage and it allows us to provide a better service for free. In this Google Tech Talk, Dominik will explain what Wuala is and how it works, and he will also show a demo. The availability problem is solved by redundancy (just like in Google File System). However simple replication techniques would result in too much overhead because of the low availability of the nodes. Instead Wuala employs erasure coding and splits the data into small pieces. Optimal erasure codes produce n/r fragments where any n fragments is sufficient to recover the original message. These pieces are then distributed in the P2P network providing good availability at a reasonable overhead. The P2P network consists of client, storage and routing nodes. The Wuala architecture uses a mix of regular and random graphs to optimize routing. Dominik also explains how Wuala architecture is designed to provide security and fairness. Wuala employs the 128 bit AES algorithm for encryption and the 2048 bit RSA algorithm for authentication. If you're interested in how Wuala manages encryption, have a look at their publication on Cryptree. They have also implemented distributed reputation audit and maintenance functions. Check out the Tech Talk! It is worth the time!

Click to read more ...