Any Galera documentation about limitations will state that tables must have primary keys. They state that DELETEs are unsupported and other DMLs could have unwanted side-effects such as inconsistent ordering: rows can appear in different order on different nodes in your cluster.

If you are not actively relying on row orders, this could seem acceptable. Deletes seem to be working when doing a quick test, and it is replicated. Especially when the table schema is from an existing solution such as Magento or Hybris: the cost to adapt the schema seems higher than the risk associated with some missing primary key.

After all, what is the risk, exactly?

The risk of lacking primary keys

As every documentation stated, the most impacting risk is on DELETEs. On the source node, the DELETE will execute without issues.

The issue is on the replication side. This is not due to any Galera internal, this is a row-based replication limitation, and there have been writings on this, like Attack No-PK Replication Lag with MySQL/Percona Server 8 Invisible Columns.

On asynchronous primary-replica setups, a DELETE without a primary key will cause a severe lag, which is already a risk for your data redundancy and high availability. Your replicas will be out of sync, so any read-write split mechanism will lead you to read obsolete data on your replicas. Your primary stays unaffected.

Though on a Galera cluster, the risk is severely aggravated due to the semi-synchronous features: flow control and certification.

On a large table, any massive DELETE can lead to a completely blocked cluster with every transaction stuck with this state:

Story of a production recurrent failure

A simple table was used to record “raw” customer transactions made through external processes. The table would grow every week. Batches would run, and each row is read to be properly integrated into the regular tables. Then, the whole table would be emptied.

At first, this was a minor system, with thousands of rows at most, so it was running smoothly for years. Thousands became millions thanks to a great marketing campaign someday, and that’s how the problems started. Because an ORM was used, emptying the table actually was:

Instead of a proper:

Some specificities worth noting:

  • Only one node served writes.
  • The flow control limit was already set at 100000 to relieve replication of some of its semi-synchronous constraints.
  • Hardware was generous, with 60+ GBs of RAM and NVMe disks.

The DELETE was launched every Saturday night, and the symptoms were:

  • A painfully slow performance on the whole cluster during the weekend
  • Every transaction was blocked on Mondays on “wsrep: initiating replication for write set” for hours. Waiting was not helping, it did not seem to improve on its own.
  • Trying to stop “replica” nodes would hang indefinitely.
  • Killing replicas and starting one again would block transactions again.

Why such severe symptoms

The DELETE, due to https://bugs.mysql.com/bug.php?id=53375, was actually making one table full scan for each record to delete.

With one million rows deleted in one go, essentially to apply the transaction, the “replica” nodes had to iterate over:

This was the case on MySQL 5.7 by default, at least. On 8.0, it’s way lower than this. More on it later!

Even waiting a week did not cut it: replica nodes never managed to apply the transaction with a big enough table.

Symptoms to look for

Quick reproduction of the issue

Starting from an oltp_delete sysbench,

With steady performances:

We make a copy of one table, and it will lack a primary key using this method.

Now, any delete on it can bring our whole cluster down:

Only 16 seconds, but now sysbench seems stuck:

It eventually took around two and a half hours because it was tested on the latest Percona XtraDB Cluster (PXC) 8.0, but trying it out on PXC 5.7 could have taken almost four days if none of the workarounds were applied.

Blocked commits

From the writer node getting queries, you would see:

It indicates another node in the cluster is struggling with its replication queues.

From a node struggling to replicate, the processlist would show:

We do see the slow wsrep applier thread “Applying batch of row changes (delete)“ running for 648 seconds on this output while all the others are waiting.

Flow control

Flow control will become permanent, explaining why every commit is delayed. We can monitor this by comparing two values of ‘wsrep_flow_control_paused_ns’ one second apart:

Locks

Using:

We can find out some wsrep applier information:

The number of row locks can give an approximate of the progress done. After waiting a bit more, we can compare the growth:

Monitoring it will show how slow it is to make progress:

Around 100-200 rows are deleted per second.

Solutions and workarounds

Find the responsible table

This is the hardest part; there are no straightforward methods for it.

It can be done by:

  • Or, a personal method (which is not scientific at all, but still brought me good results): listing the latest ibd files checkpointed when it is currently blocked.

slave_rows_search_algorithms

It has immediate effects on how bad this behavior will be.

On 8.0, its default value is already configured on the best value: “INDEX_SCAN,HASH_SCAN”.

It means it will avoid scanning the whole table for each row to delete.

On 5.7, though, it was “TABLE_SCAN,INDEX_SCAN”. The difference in performance can be seen easily and dynamically:

pxc_strict_mode

If you are using Percona XtraDB Cluster, pxc_strict_mode will block any DML with missing primary keys by default.
Unfortunately, you may have to keep it permissive on some third-party software.

Having primary keys everywhere

The best workaround will still be to have primary keys on every table, always, even if invisible. Your DELETEs don’t even need to filter on the primary key, it just needs to be defined, and MySQL will use it. sql_require_primary_key is another variable added in 8.0.13 that can be used to avoid any mistake in the future.

Percona Distribution for MySQL is a complete, stable, scalable, and secure, open source MySQL solution, delivering enterprise-grade database environments for your most critical business applications. Deploy anywhere and implement easily with one-to-one compatibility with MySQL Community Edition.

 

Try Percona Distribution for MySQL today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments