Resetting the Galera Cluster Quorum

Checklist for Galera Cluster Quorum Resetting


Introduction – What you should know about Galera Cluster to efficiently troubleshoot ? 

We have several customers on Galera Cluster and it works great if you are building synchronous MySQL / MariaDB replication solution for high availability and scale-out. Unlike MySQL / MariaDB master-slave replication topology, The Galera nodes (slaves) are master ready all times, Galera replication can guarantee zero slave lag for such installations and, due to parallel slave applying, much better throughput for the cluster. Galera Cluster is a write-set replication service provider in the form of the dlopenable library. The heart of Galera Cluster replication is wsrep API which consists of two elements:

  • wsrep Hooks: Integrates database system to write-set replication
  • dlopen(): This function makes the wsrep provider available to the wsrep hooks

The primary focus of Galera Cluster is data consistency. Transactions are either applied to every node or not at all (everything or nothing transaction guidance and management). In Galera cluster, the transaction commits (row-based replication events) are then applied on all servers, via a certification-based replication. Certification-based replication is an alternative approach to synchronous database replication using Group Communication and transaction ordering techniques.

How is Galera Cluster Replication different from MySQL Replication ?

MySQL Replication
(Asynchronous Master-Slave Replication)
Galera Cluster Replication
(Synchronous Replication)
The change data capture of MySQL master is copied to binary log. MySQL Replication happens in three threads - dump thread in MySQL master continuously read binary log and send it to slave. IO thread in the slave receives the binlog that the master's dump thread sent, and writes it to a file called relay log. Another thread in slave, called the SQL thread, that continuously reads the relay log and applies the changes to the slave server.Certification based replication - Transaction ordering using group communication method, Optimal atomic execution of transaction in a single or multiple node (** We at MinervaDB don't recommend though inserts in multiple Galera Cluster nodes in parallel though) and during COMMIT a coordinated certification based process ensures transaction consistency across the cluster.
In MySQL Master-Slave asynchronous replication, the UPDATES should always be done on one master, these are then propagated to slaves. Though It is possible to create a ring topology with multiple masters, We at MinervaDB do not recommended as it is very easy for the servers to get out of sync in case of a master failing.Every node in a Galera Cluster is always WRITE ready. Whenever a transaction commits, the row-based replication events are applied on all servers via a certification-based replication. In a certification-based replication, a transaction executes until it reaches the commit point, assuming there is no conflict. When a client issues COMMIT, Before actual transactional commit happens, the primary keys of keys of changed rows are copied into a write-set and sent to all other nodes. A deterministic certification test happens based on the primary keys on each nodes in the cluster, including the node where the write-set originated and determines whether the node can apply the write-set. If the test succeeds, the write-set ins applied to the rest of the cluster and transaction is committed. If the certification test fails, node drops the write-set and cluster rollbacks the original transaction.
Manual failover process in an ideal / normal scenario. If not carefully planned graduation of replica slave to master, There are high chances you will corrupt the entire MySQL replication infrastructure.You can build self-healing MySQL / MariaDB replication solution directly using Galera Cluster. If a node fails, the other nodes can (and will) operate without any database reliability impact. When failed nodes comes back, It will automatically synchronize with other nodes through State Snapshot Transfer (SST) or Incremental State Transfer (IST) depending on the last known state, before it is allowed back into the cluster. No data is lost when a node fails.

When do you Reset Galera Cluster Quorum ?

During a network outage / failure of split-brain situation, the node come to suspect that there is another Primary Component, to which they are no longer connected. When this happens, all nodes will return an unknown command error to all the queries. To confirm this you can query the status variable wsrep_cluster_status on each node:

SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status';

If none of the nodes return the value Primary, That means you have to reset the quorum. The result of the above query returns Primary, It indicates that node is part of the Primary Component. Any other value indicates that the node is part of a nonoperational component.

Finding the Most Advanced Node

Identify the most advanced node in the cluster before resetting the quorum. Technically, the most advanced node is the one which committed the last transaction. This node will serve as the starting point for the new Primary Component. You can identify the most advanced node with the most advanced sequence number, or seqno.You can determine this using the wsrep_last_committed status variable.

From the database client on each node, run the following query:

SHOW STATUS LIKE 'wsrep_last_committed';

Resetting the Quorum

By resetting the quorum you are actually bootstrapping the primary component of the most advanced node available. Eventually this will be the node functioning as the new Primary Component, bringing the rest of the cluster into line with its state.

You can either do this process automatic or manual

Automatic Bootstrap

Once you have identified most advanced node, You can dynamically enable pc.bootstrap under wsrep_provider_options making the node a new Primary Component, run the following command:

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

Manual Bootstrap

To manually bootstrap your cluster, complete the following steps:

1. Shut down all cluster nodes. For servers that use init, run the following command from the console:

# service mysql stop

For servers that use systemd, instead run this command:

# systemctl stop mysql

2. Start the most advanced node with the –wsrep-new-cluster option. For servers that use init, run the following command:

# service mysql start --wsrep-new-cluster

For servers that use systemd and Galera Cluster 5.5 or 5.6, instead run this command:

# systemctl start mysql --wsrep-new-cluster

For servers that use systemd and Galera Cluster 5.7, use the following command:

# /usr/bin/mysqld_bootstrap

3. Start every other node in the cluster. For servers that use init, run the following command:

# service mysql start

For servers that use systemd, instead run this command:

# systemctl start mysql

Why we recommend Automatic Bootstrap ?

When we follow Automatic Bootstrap process, the write-set cache or Cache is preserved in each node. i.e. , some or all of the joining nodes can provision themselves using the Incremental State Transfer (IST) method, rather than the much slower State Snapshot Transfer (SST) method.

Recommended Reads – Blogs on Galera Cluster Operations and MySQL Replication 

About MinervaDB Corporation 88 Articles
Independent and vendor neutral consulting, support, remote DBA services and training for MySQL, MariaDB, Percona Server, PostgreSQL and ClickHouse with core expertize in performance, scalability and high availability . We are an virtual corporation, all of us work from home on multiple timezones and stay connected via Email, Skype, Google Hangouts, Phone and IRC supporting over 250 customers worldwide
UA-155183614-1