High Scalability -

10 Comments |

Permalink |

tagged

Azure,

Cloud Provider,

ROI,

aws,

ec2 in

AWS,

Azure,

public cloud

Tuesday

Oct292019

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

Tuesday, October 29, 2019 at 9:25AM

AWS is the #1 cloud provider for open-source database hosting, and the go-to cloud for MySQL deployments. As organizations continue to migrate to the cloud, it’s important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. While many AWS users default to their managed database solution, Amazon RDS, there are alternatives available that can improve your MySQL performance on AWS through advanced customization options and unlimited EC2 instance type support. ScaleGrid offers a compelling alternative to hosting MySQL on AWS that offers better performance, more control, and no cloud vendor lock-in and the same price as Amazon RDS. In this post, we compare the performance of MySQL Amazon RDS vs. MySQL Hosting at ScaleGrid on AWS High Performance instances.

TLDR

3 Comments |

Permalink |

tagged

Cloud,

aws,

ec2,

mysql,

sql in

AWS,

Database,

Latency,

MySQL,

Performance,

RDBMS,

SSD,

amazon,

database replication,

database scalability,

sql,

testing

Thursday

Jun272019

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Thursday, June 27, 2019 at 9:33AM

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

1 Comment |

Permalink |

tagged

Azure,

Db2,

Google Cloud Platform,

aws,

mysql,

oracle,

postgresql,

redis in

AWS,

ClickHouse,

Database,

HBase,

Memcached,

MongoDB,

MySQL,

Oracle,

Postgres,

RDBMS,

Redis,

data,

db2,

google,

hybrid,

nosql,

sql,

sql server 2008

Tuesday

Apr162019

MySQL High Availability Framework Explained – Part III: Failover Scenarios

Tuesday, April 16, 2019 at 9:33AM

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

MySQL Failover Scenarios

Scenario 1 – Master MySQL Goes Down

The Corosync and Pacemaker framework detects that the master MySQL is no longer available. Pacemaker demotes the master resource and tries to recover with a restart of the MySQL service, if possible.
At this point, due to the semisynchronous nature of the replication, all transactions committed on the master have been received by at least one of the slaves.
Pacemaker waits until all the received transactions are applied on the slaves and lets the slaves report their promotion scores. The score calculation is done in such a way that the score is ‘0’ if a slave is completely in sync with the master, and is a negative number otherwise.
Pacemaker picks the slave that has reported the 0 score and promotes that slave which now assumes the role of master MySQL on which writes are allowed.
After slave promotion, the Resource Agent triggers a DNS rerouting module. The module updates the proxy DNS entry with the IP address of the new master, thus, facilitating all application writes to be redirected to the new master.
Pacemaker also sets up the available slaves to start replicating from this new master.

Thus, whenever a master MySQL goes down (whether due to a MySQL crash, OS crash, system reboot, etc.), our HA framework detects it and promotes a suitable slave to take over the role of the master. This ensures that the system continues to be available to the applications.

Scenario 2 – Slave MySQL Goes Down

The Corosync and Pacemaker framework detects that the slave MySQL is no longer available.
Pacemaker tries to recover the resource by trying to restart MySQL on the node. If it comes up, it is added back to the current master as a slave and replication continues.
If recovery fails, Pacemaker reports that resource as down – based on which alerts or notifications can be generated. If necessary, the ScaleGrid support team will handle the recovery of this node.
In this case, there is no impact on the availability of MySQL services.

Scenario 3 – Network Partition – Network Connectivity Breaks Down Between Master and Slave Nodes

This is a classical problem in any distributed system where each node thinks the other nodes are down, while in reality, only the network communication between the nodes is broken. This scenario is more commonly known as split-brain scenario, and if not handled properly, can lead to more than one node claiming to be a master MySQL which in turn leads to data inconsistencies and corruption.

Let’s use an example to review how our framework deals with split-brain scenarios in the cluster. We assume that due to network issues, the cluster has partitioned into two groups – master in one group and 2 slaves in the other group, and we will denote this as [(M), (S1,S2)].

Corosync detects that the master node is not able to communicate with the slave nodes, and the slave nodes can communicate with each other, but not with the master.
The master node will not be able to commit any transactions as the semisynchronous replication expects acknowledgement from at least one of the slaves before the master can commit. At the same time, Pacemaker shuts down MySQL on the master node due to lack of quorum based on the Pacemaker setting ‘no-quorum-policy = stop’. Quorum here means a majority of the nodes, or two out of three in a 3-node cluster setup. Since there is only one master node running in this partition of the cluster, the no-quorum-policy setting is triggered leading to the shutdown of the MySQL master.
Now, Pacemaker on the partition [(S1), (S2)] detects that there is no master available in the cluster and initiates a promotion process. Assuming that S1 is up to date with the master (as guaranteed by semisynchronous replication), it is then promoted as the new master.
Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master.

Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

This concludes our 3-part blog series on the MySQL High Availability (HA) framework using semisynchronous replication and the Corosync plus Pacemaker stack. At ScaleGrid, we offer highly available hosting for MySQL on AWS and MySQL on Azure that is implemented based on the concepts explained in this blog series. Please visit the ScaleGrid Console for a free trial of our solutions.

Permalink |

High Availability Framework,

tagged

Master Slave,

MySQL Crash,

MySQL Failover Scenarios,

MySQL High Availability,

MySQL Semisynchronous Replication,

MysQL Hosting,

Network Connectivity,

Replication,

Split-Brain,

database,

sql in

AWS,

Database,

Geo-distributed Clusters,

Failure Analysis,

MySQL,

database replication,

database scalability,

enterprise architecture,

nodes,

sql,

uptime

Wednesday

Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

Wednesday, April 3, 2019 at 9:05AM

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Permalink |

tagged

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Tuesday, February 19, 2019 at 9:15AM

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

Permalink |

Clustered Storage System,

tagged

Database-as-a-Service,

redis in

AWS,

Blog,

Caching,

Java,

Redis,

Shard,

cache,

cloud computing,

cloud storage,

cluster,

cluster storage system,

clusters,

data,

data management,

database replication,

database scalability,

horizontal scalability,

nodejs,

nodes,

nosql,

shards

Wednesday

Sep192018

How do you explain the unreasonable effectiveness of cloud security?

Wednesday, September 19, 2018 at 9:01AM

With the enormous attack surface of cloud providers like AWS, Azure, and GCP, why aren't there more security problems? Data breaches and cyber attacks occur daily. How do you explain the unreasonable effectiveness of cloud security?

Google has an ebook on their security approach; Microsoft has some web pages. Both are the equivalent of that person who is disgustingly healthy and you ask them how they do it and they say "I don't know. I just eat right, exercise, and get plenty of sleep." Not all that useful. Most of us want a hack, a trick to good health. Who wants to eat right?

I'm sure Amazon also eats right, exercises, and gets plenty of sleep (probably not the people who work there), but AWS also has a secret that when that disgustingly healthy person starts talking about at a party, you just can't help leaning in and listening.

What's the trick to 6-pack security? Proving systems correct. Does your datacenter do that? I didn't think so. AWS does.

Dr. Byron Cook gave an enthusiastic talk on Formal Reasoning about the Security of Amazon Web Service. He's clearly excited about finally applying his research in a real-world setting. This is the trojan horse the FloC (Federated Logic Conference) community has been waiting for. It's almost as if he's a FLoC guy working at AWS rather than an AWS guy giving a FLoC talk.

The main take-aways for me were:

4 Comments |

Permalink |

AWS,

security

Monday

Jun082015

Leveraging AWS to Build a Scalable Data Pipeline

Monday, June 8, 2015 at 10:06AM

While at Netflix and LinkedIn Siddharth "Sid" Anand wrote some great articles for HS. Sid is back, this time as a Data Architect at Agari. Original article is here.

Data-rich companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, and uptime, they need to closely manage their core technology. Like serving systems (e.g. web application servers and OLTP databases) that need to be up 24x7 to display content to users the world over, data pipelines need to be up and running in order to pick the most engaging and up-to-date content to display. In other words, updated ranking models, new content recommendations, and the like are what make data pipelines an integral part of an end user’s web experience by picking engaging, personalized content.

Agari, a data-driven email security company, is no different in its demand for a low-latency, reliable, and scalable data pipeline. It must process a flood of inbound email and email authentication metrics, analyze this data in a timely manner, often enriching it with 3rd party data and model-driven derived data, and publish findings. One twist is that Agari, unlike the companies listed above, operates completely in the cloud, specifically in AWS. This has turned out to be more a boon than a disadvantage.

Below is one such data pipeline used at Agari.

Agari Data Pipeline

Data Flow

2 Comments |

Permalink |

AWS,

BigData

Wednesday

Oct152014

Using a SSD Cache in Front of EBS Boosted Throughput by 50%, for Free

Wednesday, October 15, 2014 at 8:56AM

Using EBS has lots of advantages--reliability, snapshotting, resizing--but overcoming the performance problems by using Provisioned IOPS is expensive.

Swrve, an integrated marketing and A/B testing and optimization platform for mobile apps, did something clever. They are using the c3.xlarge EC2 instances, that have two 40GB SSD devices per instance, as a cache.

They found through testing RAID-0 striping using a 4-way stripe along with enhanceio, effectively increased throughput by over 50%, for free. With no filesystem corruption problems.

How is it free? "We were planning on upgrading to the C3 class of instance anyway, and sticking with EBS as the backing store. Once you’re using an instance which has SSD ephemeral storage, there are no additional fees to use that hardware."

For great analysis, lots of juicy details, graphs, and configuration commands, please take a look at How we increased our EC2 event throughput by 50%, for free.

Permalink |

AWS,

Strategy,

flash

Wednesday

Aug202014

Part 2: The Cloud Does Equal High performance

Wednesday, August 20, 2014 at 8:57AM

This a guest post by Anshu Prateek, Tech Lead, DevOps at Aerospike and Rajkumar Iyer, Member of the Technical Staff at Aerospike.

In our first post we busted the myth that cloud != high performance and outlined the steps to 1 Million TPS (100% reads in RAM) on 1 Amazon EC2 instance for just $1.68/hr. In this post we evaluate the performance of 4 Amazon instances when running a 4 node Aerospike cluster in RAM with 5 different read/write workloads and show that the r3.2xlarge instance delivers the best price/performance.

Several reports have already documented the performance of distributed NoSQL databases on virtual and bare metal cloud infrastructures:

10 Comments |

Permalink |