diff --git a/README.md b/README.md new file mode 100644 index 0000000..6ca2f14 --- /dev/null +++ b/README.md @@ -0,0 +1,243 @@ +

PostgreSQL and Repmgr for High Availability and automated load balancing with HAProxy, Corosync and Pacemaker

+
+ +
+ + Build Status + + + Ansible Galaxy + + + Ansible Quality Score + + + MIT License + +
+
+ + + +## Introduction + + Ansible role to configure PostgreSQL, Repmgr for High Availability + and automated load balancing with HAProxy, Corosync and Pacemaker + + +## Automated replication and failover + +The base functionality offer by the role is to configure the PostgreSQL cluster with replication monitored +by Repmgr. +You will need to have, ideally, 3 nodes in order to satisfy the qorum size. + +Each node should have the `postgres_role` set to `master` (exactly 1) and `slave` (2 or more) +```yaml + host_vars: + node1: + postgres_role: master + node2: + postgres_role: slave + node3: + postgres_role: slave +``` + +The role will work out of the box, with standard master-slave replication, with no configuration needed. Check [defaults/main.yml](./defaults/main.yml) for optional config vars. + +To check the replication status , login as postgres and run: +``` +postgres@node1:~$ psql -Aqtc "SELECT count(*) FROM pg_stat_replication" +2 +``` + +## Managed Replication using Repmgr + +repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations. + +In order to turn on the installation and configuration using repmgr, set `postgres_repmgr_enabled` group variable to true: + +```yaml +group_vars: + postgres_repmgr_enabled: true +``` + +This will be using repmgr instead of pg_basebackup to clone the master and than register it in the repmgr cluster. + +Once configured, you can check the replication status via repmgr as follows: +``` +postgres@node1:~$ repmgr -f /etc/repmgr.conf cluster crosscheck +INFO: connecting to database + Name | ID | 1 | 2 | 3 +-------------+----+---+---+--- + node1 | 1 | * | * | * + node2 | 2 | * | * | * + node3 | 3 | * | * | * + + +postgres@node1:~$ repmgr -f /etc/repmgr.conf cluster show + ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string +----+-------------+---------+-----------+-------------+----------+----------+----------+------------------------------------------------------------- + 1 | node1 | primary | * running | | default | 100 | 1 | host=172.17.0.3 dbname=repmgr user=repmgr connect_timeout=2 + 2 | node2 | standby | running | node1 | default | 100 | 1 | host=172.17.0.4 dbname=repmgr user=repmgr connect_timeout=2 + 3 | node3 | standby | running | node1 | default | 100 | 1 | host=172.17.0.5 dbname=repmgr user=repmgr connect_timeout=2 + +``` + + +## Load Balancing and Automated Failover + +Repmgr will ensure that the replication remains functional by choosing a new master (primary) and adjusting the replication slave (master) targets. +The clients won't know however, which is the master and which is the server unless they do some discovery (ex: check replication status). + + +The approach found in this role takes advantage of the Corosync/Pacemaker to manage iptables rules, by automatically blocking writes on the slaves Repmgr Sentinel triggers failover events. + + +- each node will listen on 3 ports 5432 (default), 5434(RO), 5435 (RW), last 2 via iptables prerouting +- the master will have `postgres_role=master` corosync note attribute and `postgres_role=slave` for slaves respectivelly +- corosync will block port 5435 when redis_role=replica +- corosync will block port 5435 and 5434 when redis_role=fail +- the load balancer will try 5435 port for RW and only one server will NOT reject the connections `postgres_role=master` + + +### Requirements +The nodes should have Corosync with Pacemaker already configured. Check my [corosync_pacemaker ansible role](https://github.com/mariancraciun1983/ansible-corosync-pacemaker) . Having `symmetric-cluster` is also required so that not all nodes will get the resources assigned to them, but only based on pacemaker colocation rules. + +`playbook.yml`: + +```yaml +- name: Prepare Corosync/Pacemaker + hosts: all + gather_facts: true + roles: + - mariancraciun1983.corosync_pacemaker + - mariancraciun1983.postgres_repmgr_ha +``` + +group_vars/all.yml +```yaml +# corosync config +install_python3: true +corosync_hacluster_password: 1q2w3e4r5t +corosync_cluster_settings: + - key: stonith-enabled + value: "false" + - key: no-quorum-policy + value: ignore + - key: start-failure-is-fatal + value: "false" + - key: symmetric-cluster + value: "false" + +# enable repmgr +postgres_repmgr_enabled: true +# enable repmgr integration with pacemaker +postgres_repmgr_pacemaker: true +# this must be true only once, initially, when the Corosync node attributes need to be configured +# after that, repmgr will be triggering node attributes updates in case of a failover +postgres_repmgr_pacemaker_helpers_init: true +``` + + +Running `crm_mon -AnfroRtc` will gives us the following: +``` +Cluster Summary: + * Stack: corosync + * Current DC: repmgrpmk2 (2) (version 2.0.3-4b1f869f0f) - partition with quorum + * Last updated: Mon Nov 16 05:53:53 2020 + * Last change: Mon Nov 16 05:53:06 2020 by postgres via crm_attribute on repmgrpmk2 + * 3 nodes configured + * 6 resource instances configured + +Node List: + * Node repmgrpmk1 (1): online: + * Resources: + * Node repmgrpmk2 (2): online: + * Resources: + * PostgresqlLBWriteBlock (ocf::heartbeat:command_raw): Started + * Node repmgrpmk3 (3): online: + * Resources: + * PostgresqlLBWriteBlock (ocf::heartbeat:command_raw): Started + +Inactive Resources: + * Clone Set: PostgresqlLBReadBlock-clone [PostgresqlLBReadBlock]: + * PostgresqlLBReadBlock (ocf::heartbeat:command_raw): Stopped + * PostgresqlLBReadBlock (ocf::heartbeat:command_raw): Stopped + * PostgresqlLBReadBlock (ocf::heartbeat:command_raw): Stopped + * Stopped: [ repmgrpmk1 repmgrpmk2 repmgrpmk3 ] + * Clone Set: PostgresqlLBWriteBlock-clone [PostgresqlLBWriteBlock]: + * PostgresqlLBWriteBlock (ocf::heartbeat:command_raw): Started repmgrpmk2 + * PostgresqlLBWriteBlock (ocf::heartbeat:command_raw): Started repmgrpmk3 + * PostgresqlLBWriteBlock (ocf::heartbeat:command_raw): Stopped + * Started: [ repmgrpmk2 repmgrpmk3 ] + * Stopped: [ repmgrpmk1 ] + +Node Attributes: + * Node: repmgrpmk1 (1): + * postgresql_role : primary + * Node: repmgrpmk2 (2): + * postgresql_role : replica + * Node: repmgrpmk3 (3): + * postgresql_role : replica + +Operations: + * Node: repmgrpmk2 (2): + * PostgresqlLBWriteBlock: migration-threshold=1000000: + * (12) start: last-rc-change="Mon Nov 16 05:53:06 2020" last-run="Mon Nov 16 05:53:06 2020" exec-time="21ms" queue-time="0ms" rc=0 (ok) + * (13) monitor: interval="10000ms" last-rc-change="Mon Nov 16 05:53:06 2020" exec-time="13ms" queue-time="0ms" rc=0 (ok) + * Node: repmgrpmk1 (1): + * Node: repmgrpmk3 (3): + * PostgresqlLBWriteBlock: migration-threshold=1000000: + * (12) start: last-rc-change="Mon Nov 16 05:53:06 2020" last-run="Mon Nov 16 05:53:06 2020" exec-time="28ms" queue-time="0ms" rc=0 (ok) + * (13) monitor: interval="10000ms" last-rc-change="Mon Nov 16 05:53:06 2020" exec-time="16ms" queue-time="0ms" rc=0 (ok) +``` + +## Others +The role also offers the posibility to use a different/internal ip +as follows: +```yaml +group_vars: + postgres_use_internal_ip: true +host_vars: + node1: + postgres_role: master + internal_ip: 10.0.0.1 + node2: + postgres_role: slave + internal_ip: 10.0.0.2 + node3: + postgres_role: slave + internal_ip: 10.0.0.3 +``` + +# TODO + - integrate barman + - add witness node configuration + +## Testing + +Molecule with docker is being used with 2 scenarios: + - default - non-repmgr + - repmgr + - repmgr + corosync/pacemaker + +Running the tests: + +```bash +pipenv install +pipenv run molecule test + +# or test individual scenario +# standard replication +pipenv run molecule test -s default +# repmgr managed replication and failover +pipenv run molecule test -s repmgr +# repmgr with corosync/pacemaker +pipenv run molecule test -s repmgrpmk +``` + +## License + +MIT License + +The code contains the [iptables_raw](https://github.com/Nordeus/ansible_iptables_raw) ansible module which is also licensed under MIT License. \ No newline at end of file diff --git a/meta/main.yml b/meta/main.yml index e13a268..fe81f34 100644 --- a/meta/main.yml +++ b/meta/main.yml @@ -3,7 +3,7 @@ galaxy_info: role_name: postgres_repmgr_ha author: Marian Craciun description: >- - Ansible role to configure PostgreSQL, Repmgr for High Availability + Configure PostgreSQL, Repmgr for High Availability and automated load balancing with HAProxy, Corosync and Pacemaker license: MIT diff --git a/tasks/repmgr/pacemaker/resources.yml b/tasks/repmgr/pacemaker/resources.yml index 1d74fa5..611c486 100644 --- a/tasks/repmgr/pacemaker/resources.yml +++ b/tasks/repmgr/pacemaker/resources.yml @@ -4,7 +4,7 @@ register: pcs_out changed_when: pcs_out.rc == 0 failed_when: >- - pcs_out.rc != 0 and \ + pcs_out.rc != 0 and pcs_out.stderr.find('already exists') == -1 and pcs_out.stderr.find('is already a clone') == -1 with_items: