Galera 4 Streaming Replication in Percona XtraDB ClusterI was testing the latest Percona XtraDB Cluster 8.0 (PXC) release which has the Galera 4 plugin, and I would like to share my experiences and thoughts on the Streaming Replication feature so far.

What Is Streaming Replication, in One Sentence?

In Galera 4, the large transaction could split into smaller fragments, and even before it got committed these fragments have been replicated to the other nodes and have already started the certification and apply process.

The manual describes all the pros and cons, but let’s see how it works. I have created a table with 10M rows and I am going to run some large updates on that.

First I have run the updates without Streaming Replication, and because it is disabled by default we do not have to do anything but simply run the update. On node1 I am recording the time before and after the update, and on node2 I am running a select in every second to see when this update gets actually committed on the other nodes.

On the writer:

It took around 34s to run this query.

Check the other node:

As I said, I was running a query in every second to see when the k changes to 1. It took 28s for node2 to certify, apply, and commit this update. It is important to understand that while the node is certifying this large update, all the other changes on any other table will be blocked, so basically the node will stall/freeze for almost 30s.

Will Streaming Replication Decrease This Delay?

In theory, the other nodes should catch up faster because while the update is running on the node1 the fragments have been already replicated and applied on the other nodes as well.

I am going to set the fragment size to 1MB.

Now it took 40s to run. Let’s see the other node.

In the same second, k was already 2 on node2 as well. We do not have the 28s delay anymore which is great, but the query itself got a bit slower. I ran these updates many times and it was always faster without Streaming Replication, but with large transactions, I would prefer Streaming Replication because the other nodes will not be drifted too far.

Why Could It Be Slower With Streaming Replication?

One of the reasons could be because, with Streaming Replication, Galera basically double-writes the write-sets. It records them in the mysql.wsrep_streaming_log table to ensure Streaming Replication updates in case of a crash, as well. This is why it is not a good idea to enable Streaming Replication for all the queries.

Does the Fragment Size Matter?

Let’s change the fragment size to 0.1MB:

Now it took more than 50s, so as we can see size matters. I also did a test when I increased the fragment size to 100MB, and in that case the query took around 40s again, but the other nodes needed more time to catch up, around 10s.

Finding the right fragment size is important, and in my tests 1MB was acceptable, but you could also define the number of rows, and create fragments after every 10000 rows.

Locking

Before Galera 4, locks were not propagated to other nodes. If you were running a query on node1, InnoDB locked the necessary rows on node1 only, but node2 and node3 did not know which rows were locked. If someone was writing node2 and node3, Galera had to handle these conflicts, basically the transaction which was committed first, won. If you are writing multiple nodes, you can have a lot of conflicts because of this.

But with Streaming Replication this has also changed. The fragments are replicated to the other nodes and they will hold the necessary locks there as well.

Let’s see how this works, first without Streaming Replication, step by step.

On node1 and node2 I start a transaction, on node1 I run a large update, on node2 I delete a single row, then I commit the transaction on node2. When I commit on node1 I get a deadlock as we except:

Because there is no cluster-wide locking, in this case, it will use optimistic locking and will get an error during the certification.

How Does This Work With Streaming Replication?

Repeat the same steps:

Same steps, but when we are trying to run the delete on node2 it is just hanging there, it is waiting for the locks which are actually coming from node1 because of the Streaming Replication. When I commit the transaction on node1, then node2 can run the delete query. This could save us from a lot of deadlocks and conflicts. Queries might have to wait for the locks but in the end they should still finish.

This sounds great, but I would still not recommend enabling Streaming Replication as default, as it has a performance impact and it double-writes everything into the mysql.wsrep_streaming_log table. Also if you would like to roll back a big transaction, it has to be done on all the nodes, not just one.

Metrics/Monitoring

There is a table called mysql.wsrep_streaming_log which contains information about the currently running Streams.

But be careful, because the column called frag contains the whole binary log replication event, which can be quite large. Before I realized that, I got the following errors:

Unfortunately, as far as I know, there is no counter to monitor how many queries are replicated by Streaming Replication. I created a feature request to cover this.

Conclusion

So far, Streaming Replication works very well and I think it has great potential. Because it is possible to change it only in the sessions, yourself or the application can easily turn it on when it is necessary, but I am also waiting to see real-life experiences with it.