Percona Toolkit and pt-online-schema-changeRecently I was dealing with an unexpected issue raised by our Support customer, in which data became inconsistent after a schema change was applied.

After some investigation, it turned out that affected tables had a special word in the comments of some columns, which triggered an already known (and fixed) issue with the TableParser.pm library of Percona Toolkit.  The problem is that the customer was using an outdated Toolkit version, where pt-online-schema-change was using that buggy parser.

This bug applies only to Percona Toolkit versions up to 3.0.10, so if you have already 3.0.11 or newer installed, you can skip the rest of this post as these are no longer affected.

I am writing this post to warn every user of pt-online-schema-change who has not upgraded the toolkit, as the problem is potentially very dangerous and can lead to the silent loss of data. 

The problem can manifest in two ways. The first one, although confusing, is not really dangerous as the operation is canceled. It happens when the columns with culprit comments do not allow NULL values. For example:

Schema change operation on this one will look like this:

So the reason for the failed operation may be unclear, but at least no data gets damaged. A much worse result happens when the column with comment allows nulls:

For this one, the schema change command runs without any errors:

But… the table data is not the same after:

Summarizing, it is essential to make sure you are using the up to date Percona Toolkit, especially the pt-online-schema-change tool, to avoid potential disaster. The current latest stable release, as of when I am writing the post, is version 3.2.1, and the fixed version for this particular bug, 3.0.11, was released in July 2018.

References:

https://www.percona.com/doc/percona-toolkit/3.0/release_notes.html
https://jira.percona.com/browse/PT-1570