Managing High Availability in PostgreSQL – Part III: Patroni

Monday

Sep162019

Managing High Availability in PostgreSQL – Part III: Patroni

Monday, September 16, 2019 at 12:31PM

In our previous blog posts, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by Cluster Labs and Replication Manager (repmgr) by 2ndQuadrant. In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment.

Patroni for PostgreSQL

Patroni originated as a fork of Governor, a project from Compose. It is an open-source tool suite, written in Python, for managing high availability of PostgreSQL clusters. Instead of building its own consistency protocol, Patroni smartly leverages the consistency model provided by a Distributed Configuration Store (DCS). It also supports other DCS solutions like Zookeeper, etcd, Consul and Kubernetes.

Patroni ensures the end-to-end setup of PostgreSQL HA clusters, including streaming replication. It supports various ways for creating a standby node, and works like a template that can be customized to your needs.

This feature-rich tool exposes its functionality via REST APIs and also via a command line utility called patronictl. It supports integration with HAProxy by using its health check APIs to handle load balancing.

Patroni also supports event notification with the help of callbacks, which are scripts triggered by certain actions. It enables users to perform any maintenance actions by providing pause/resume functionality. The Watchdog support feature makes the framework even more robust.

How it Works

Initially, PostgreSQL and Patroni binaries needs to be installed. Once this is done, you will also need to setup a HA DCS configuration. All the necessary configurations to bootstrap the cluster needs to be specified in the yaml configuration file and Patroni will use this file for initialization. On the first node, Patroni initializes the database, obtains the leader lock from DCS, and ensures the node is being run as the master.

The next step is adding standby nodes, for which Patroni provides multiple options. By default, Patroni uses pg_basebackup to create the standby node, and also supports custom methods like WAL-E, pgBackRest, Barman and others for the standby node creation. Patroni makes it very simple to add a standby node, and handles all the bootstrapping tasks and setting up of your streaming replication.

Once your cluster setup is complete, Patroni will actively monitor the cluster and ensure it's in a healthy state. The master node renews the leader lock every ttl second(s) (default: 30 seconds). When the master node fails to renew the leader lock, Patroni triggers an election, and the node which will obtain the leader lock will be elected as the new master.

How Does it Handle the Split Brain Scenario?

In a distributed system, consensus plays an important role in determining consistency, and Patroni uses DCS to attain consensus. Only the node that holds the leader lock can be the master and the leader lock is obtained via DCS. If the master node doesn’t hold the leader lock, then it will be demoted immediately by Patroni to run as a standby. This way, at any point in time, there can only be one master running in the system.

Are There Any Setup Requirements?

Patroni needs python 2.7 and above.
The DCS and its specific python module must be installed. For test purposes, DCS can be installed on same nodes running PostgreSQL. However, in production, DCS must be installed on separate nodes.

The yaml configuration file must be present using these high level configuration settings:

Global/Universal
This includes configuration such as the name of the host (name) which needs to be unique for the cluster, the name of the cluster (scope) and path for storing config in DCS (namespace).

Log
Patroni-specific log settings including level, format, file_num, file_size etc.

Bootstrap configuration
This is the global configuration for a cluster that will be written to DCS. These configuration parameters can be changed with the help of Patroni APIs or directly in DCS.The bootstrap configuration includes standby creation methods, initdb parameters, post initialization script etc. It also contains timeouts configuration, parameters to decide the usage of PostgreSQL features like replication slots, synchronous mode etc.This section will be written into /<namespace>/<scope>/config of a given configuration store after the initializing of new cluster.

PostgreSQL
This section contains the PostgreSQL-specific parameters like authentication, directory paths for data, binary and config, listen ip address etc.

REST API
This section includes the Patroni-specific configuration related to REST API's such as listen address, authentication, SSL etc.

Consul
Settings specific to Consul DCS.

Etcd
Settings specific to Etcd DCS.

Exhibitor
Settings specific to Exhibitor DCS.

Kubernetes
Settings specific to Kubernetes DCS.

ZooKeeper
Settings specific to ZooKeeper DCS.

Watchdog
Settings specific to Watchdog.

Patroni Pros

Patroni enables end-to-end setup up of the cluster.
Supports REST APIs and HAproxy integration.
Supports event notifications via callbacks scripts triggered by certain actions.
Leverages DCS for consensus.

Patroni Cons

Patroni will not detect the misconfiguration of a standby with an unknown or non-existent node in recovery configuration. The node will be shown as a slave even if the standby is running without connecting to the master/cascading standby node.
User needs to handle setup, management, and upgrade of the DCS software.
Requires multiple ports to be open for components communication:
- REST API port for Patroni
- Minimum 2 ports for DCS

High Availability Test Scenarios

We conducted a few tests on PostgreSQL HA management using Patroni. All of these tests were performed while the application was running and inserting data to the PostgreSQL database. The application was written using PostgreSQL Java JDBC Driver leveraging the connection failover capability.

Standby Server Tests

Sl. No	Test Scenario	Observation
1	Kill the PostgreSQL process	Patroni brought the PostgreSQL process back to running state. There was no disruption of the writer application.
2	Stop the PostgreSQL process	Patroni brought the PostgreSQL process back to running state. There was no disruption of the writer application.
3	Reboot the server	Patroni needs to be started after reboot, unless configured to not start on reboot. Once Patroni was started, it started the PostgreSQL process and setup the standby configuration. There was no disruption of the writer application.
4	Stop the Patroni process	It did not stop the PostgreSQL process. patronictl list did not display this server. There was no disruption of the writer application. So, essentially, you need to monitor the health of the Patroni process - otherwise it will lead to issues down the line.

Master/Primary Server Tests

Sl. No	Test Scenario	Observation
1	Kill the PostgreSQL process	Patroni brought the PostgreSQL process back to running state. Patroni running on that node had primary lock and so the election was not triggered. There was downtime in the writer application.
2	Stop the PostgreSQL process and bring it back immediately after health check expiry	Patroni brought the PostgreSQL process back to running state. Patroni running on that node had primary lock and so the election was not triggered. There was downtime in the writer application.
3	Reboot the server	Failover happened and one of the standby servers was elected as the new master after obtaining the lock. When Patroni was started on the old master, it brought back the old master up and performed pg_rewind and started following the new master.T There was downtime in the writer application.
4	Stop/Kill the Patroni process	One of the standby servers acquired the DCS lock and became the master by promoting itself. The old master was still running and it led to multi-master scenario. The application was still writing to the old master. Once Patroni was started on the old master, it rewound the old master (use_pg_rewind was set to true) to the new master timeline and lsn and started following the new master. As you can see above, it is very important to monitor the health of the Patroni process on the master. Failure to do so can lead to a multi-master scenario and potential data loss.

Network Isolation Tests

Sl. No	Test Scenario	Observation
1	Network-isolate the master server from other servers	DCS communication was blocked for master node. PostgreSQL was demoted on the master server. A new master was elected in the majority partition. There was a downtime in the writer application.
2	Network-isolate the standby server from other servers	DCS communication was blocked for the standby node. The PostgreSQL service was running, however, the node was not considered for elections. There was no disruption in the writer application.

What's the Best PostgreSQL HA Framework?

Patroni is a valuable tool for PostgreSQL database administrators (DBAs), as it performs end-to-end setup and monitoring of a PostgreSQL cluster. The flexibility of choosing DCS and standby creation is an advantage to the end user, as they can choose the method they are comfortable with.

REST APIs, HaProxy integration, Watchdog support, callbacks and its feature-rich management makes Patroni the best solution for PostgreSQL HA management.

PostgreSQL HA Framework Testing: PAF vs. repmgr vs. Patroni

Included below is a comprehensive table detailing the results of all the tests we have performed on all three frameworks - PostgreSQL Automatic Failover (PAF), Replication Manager (repmgr) and Patroni.

Standby Server Tests

Test Scenario	PostgreSQL Automatic Failover (PAF)	Replication Manager (repmgr)	Patroni
Kill the PostgreSQL process	Pacemaker brought the PostgreSQL process back to running state. There was no disruption of the writer application.	Standby server was marked as failed. Manual intervention was required to start the PostgreSQL process again. There was no disruption of the writer application.	Patroni brought the PostgreSQL process back to running state. There was no disruption of the writer application.
Stop the PostgreSQL process	Pacemaker brought the PostgreSQL process back to running state. There was no disruption of the writer application.	Standby server was marked as failed. Manual intervention was required to start the PostgreSQL process again. There was no disruption of the writer application.	Patroni brought the PostgreSQL process back to running state. There was no disruption of the writer application.
Reboot the server	Standby server was marked offline initially. Once the server came up after reboot, PostgreSQL was started by Pacemaker and the server was marked as online. If fencing was enabled then node wouldn’t have been added automatically to cluster. There was no disruption of the writer application.	Standby server was marked as failed. Once the server came up after reboot, PostgreSQL was started manually and server was marked as running. There was no disruption of the writer application.	Patroni needs to be started after reboot, unless configured to not start on reboot. Once Patroni was started, it started the PostgreSQL process and setup the standby configuration. There was no disruption of the writer application.
Stop the framework agent process	Agent: pacemaker The PostgreSQL process was stopped and was marked offline. There was no disruption of the writer application.	Agent: repmgrd The standby server will not be part of automated failover situation. PostgreSQL service was found to be running. There was no disruption of the writer application.	Agent: patroni It did not stop the PostgreSQL process. patronictl list did not display this server. There was no disruption of the writer application.

Master/Primary Server Tests

Test Scenario	PostgreSQL Automatic Failover (PAF)	Replication Manager (repmgr)	Patroni
Kill the PostgreSQL process	Pacemaker brought the PostgreSQL process back to running state. Primary got recovered within the threshold time and hence election was not triggered. There was downtime in the writer application.	repmgrd started health check for primary server connection on all standby servers for a fixed interval. When all retries failed, an election was triggered on all the standby servers. As a result of the election, the standby which had the latest received LSN got promoted. The standby servers which lost the election will wait for the notification from the new master node and will follow it once they receive the notification.Manual intervention was required to start the postgreSQL process again. There was downtime in the writer application.	Patroni brought the PostgreSQL process back to running state. Patroni running on that node had primary lock and hence election was not triggered. There was downtime in the writer application.
Stop the PostgreSQL process and bring it back immediately after health check expiry	Pacemaker brought the PostgreSQL process back to running state. Primary got recovered within the threshold time and hence election was not triggered. There was downtime in the writer application.	repmgrd started health check for primary server connections on all standby servers for a fixed interval. When all the retries failed, an election was triggered on all the standby nodes. However, the newly elected master didn’t notify the existing standby servers since the old master was back.Cluster was left in an indeterminate state and manual intervention was required. There was downtime in the writer application.	Patroni brought the PostgreSQL process back to running state. Patroni running on that node had primary lock and hence election was not triggered. There was downtime in the writer application.
Reboot the server	Election was triggered by Pacemaker after the threshold time for which master was not available. The most eligible standby server was promoted as the new master. Once the old master came up after reboot, it was added back to the cluster as a standby. If fencing was enabled, then node wouldn’t have been added automatically to cluster. There was downtime in the writer application.	repmgrd started election when master connection health check failed on all standby servers. The eligible standby was promoted. When this server came back, it didn’t join the cluster and was marked failed. repmgr node rejoin command was run to add the server back to the cluster. There was downtime in the writer application.	Failover happened and one of the standby servers was elected as the new master after obtaining the lock. When Patroni was started on the old master, it brought back the old master up and performed pg_rewind and started following the new master. There was downtime in the writer application.
Stop the framework agent process	Agent: pacemaker The PostgreSQL process was stopped and it was marked offline. Election was triggered and new master was elected. There was downtime in writer application.	Agent: repmgrd The primary server will not be part of the automated failover situation. PostgreSQL service was found to be running. There was no disruption in writer application.	Agent: patroni One of the standby servers acquired the DCS lock and became the master by promoting itself. The old master was still running and it led to multi-master scenario. The application was still writing to the old master. Once Patroni was started on the old master, it rewound the old master (use_pg_rewind was set to true) to the new master timeline and lsn and started following the new master.

Network Isolation Tests

Test Scenario	PostgreSQL Automatic Failover (PAF)	Replication Manager (repmgr)	Patroni
Network isolate the master server from other servers (split brain scenario)	Corosync traffic was blocked on the master server. PostgreSQL service was turned off and master server was marked offline due to quorum policy. A new master was elected in the majority partition. There was a downtime in the writer application.	All servers have the same value for location in repmgr configuration: repmgrd started election when master connection health check failed on all standby servers. The eligible standby was promoted, but the PostgreSQL process was still running on the old master node. There were two nodes running as master. Manual intervention was required after the network isolation was corrected. The standby servers have the same value for location but the primary had a different value for location in repmgr configuration: repmgrd started election when master connection health check failed on all standby servers. But, there was no new master elected since the standby servers had location different from that of the primary. repmgrd went into degrade monitoring mode. PostgreSQL was running on all the nodes and there was only one master in the cluster.	DCS communication was blocked for master node. PostgreSQL was demoted on the master server. A new master was elected in the majority partition. There was a downtime in the writer application.
Network-isolate the standby server from other servers	Corosync traffic was blocked on the standby server. The server was marked offline and PostgreSQL service was turned off due to quorum policy. There was no disruption in the writer application.	repmgrd went into degrade monitoring mode. The PostgreSQL process was still running on the standby node. Manual intervention was required after the network isolation was corrected.	DCS communication was blocked for the standby node. The PostgreSQL service was running, however, the node was not considered for elections. There was no disruption in the writer application.

Test Scenario

PostgreSQL Automatic Failover (PAF)

Replication Manager (repmgr)

Patroni

Network isolate the master server from other servers (split brain scenario)

Corosync traffic was blocked on the master server.

PostgreSQL service was turned off and master server was marked offline due to quorum policy.
A new master was elected in the majority partition.
There was a downtime in the writer application.

All servers have the same value for location in repmgr configuration:

repmgrd started election when master connection health check failed on all standby servers.
The eligible standby was promoted, but the PostgreSQL process was still running on the old master node.
There were two nodes running as master. Manual intervention was required after the network isolation was corrected.

The standby servers have the same value for location but the primary had a different value for location in repmgr configuration:

repmgrd started election when master connection health check failed on all standby servers.
But, there was no new master elected since the standby servers had location different from that of the primary.
repmgrd went into degrade monitoring mode. PostgreSQL was running on all the nodes and there was only one master in the cluster.

DCS communication was blocked for master node.

PostgreSQL was demoted on the master server.
A new master was elected in the majority partition.
There was a downtime in the writer application.

Network-isolate the standby server from other servers

Corosync traffic was blocked on the standby server.

The server was marked offline and PostgreSQL service was turned off due to quorum policy.
There was no disruption in the writer application.

repmgrd went into degrade monitoring mode.
The PostgreSQL process was still running on the standby node.
Manual intervention was required after the network isolation was corrected.

DCS communication was blocked for the standby node.

The PostgreSQL service was running, however, the node was not considered for elections.
There was no disruption in the writer application.

Kristi Anderson |

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>