multithreaded alter tablept-online-schema-change emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. It executes INSERT statements to import the data, that runs in a single connection to fill the new table. In this repository, there is a script called myloader_pt-osc.sh that uses myloader to execute parallel INSERTs, instead of using pt-online-schema-change internal tooling, but it also uses a patched version of pt-online-schema-change.

Patch pt-online-schema-change

There are 2 changes that we needed to perform on pt-online-schema-change:

1- Be able to stop the execution after the copy of the table is created
2- Tell pt-online-schema-change that no data needs to be imported

We are adding the –no-data to tell pt-online-schema-change to do not import the data to the new table. pt-online-schema-change is able to stop the execution with –pause-file option. The patch file includes the code to stop the execution when –no-data and –pause-file is used.

Procedure

The script is performing several tasks, the main ones are:

  • Creating a backup directory in a format that myloader is able to use.
  • Starting the pt-online-schema-change which is going to stop after creating the new table and triggers.
  • Start the myloader execution.
  • Resume pt-online-schema-change, after myloader finishes, to swap the tables.

Take into consideration that the script is not designed to support multi-column nor non-integer primary keys. Do not use it in production environments without testing it thoroughly.

Timings

A sysbench table with 10M rows was created to perform an alter table with “engine innodb” which is like an optimize table, I used these two executions to compare:

The chunk-size and the number of threads were variables to create these graphs:

 

pt-online-schema-change MySQL

 

In the next graph is the improvement in percentage:

 

myloader mysql

 

Use Cases

I consider that this method or strategy should be used on specific scenarios, as there are considerations when not to use it, for instance:

  • The pt-online-schema-change throttle mechanism will not slow down the insert, slaves might get behind.
  • As the intention is to use multiple threads, the server will receive more load and no mechanism in place to stop it.

Conclusion and Expectation

Basically, it is a pt-online-schema-change wrapper that uses myloader to perform the INSERTs in parallel which makes a faster alter table. It might be used when times need to be reduced and the load is not an issue.

I would like to see this implemented inside pt-online-schema-change as a possible strategy but until this happens, this might be an alternative.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
laimisnd

Hi

A very interesting approach of using the parallel myloader.

One question.
How is the shared lock/lost delete handled if myloader is used?
Forget if I am a bit rusty but this is pt-osc sql:
INSERT LOW_PRIORITY IGNORE INTO … SELECT … FORCE INDEX ( PRIMARY ) WHERE ( ( id >= ? ) ) AND ( ( id <= ? ) ) LOCK IN SHARE MODE

You get the idea, lock + triggers + insert with ignore handles the concurency, lost delete mostly (but could be other phenomenas also).

Y T

NEVER USE SCRIPT, YOU WILL LOSS DATA WITHOUT WARNING !!!!!!!!!!!!!!!!!!!!!!

Used script to replace indexes, got EMPTY TABLE as result.

./myloader_py.sh -D bc -T ohlcv –chunk-size 10000 -t 40 –alter ‘DROP INDEX idx2 , ADD INDEX idx2(exchange ASC, pair ASC, ts_close ASC, close ASC)’

Used script to alter table without primary index.
Sun Jun 7 23:59:16 UTC 2020: Starting pt-osc
ERROR 1054 (42S22) at line 1: Unknown column ‘id’ in ‘field list’
Sun Jun 7 23:59:16 UTC 2020: Creating the INSERT INTO files
Sun Jun 7 23:59:16 UTC 2020: All pids finished
Sun Jun 7 23:59:16 UTC 2020: Starting myloader

** (myloader:79380): CRITICAL **: 23:59:16.630: Error restoring bc.ohlcv from file bc.ohlcv.1.sql: Unknown column ‘id’ in ‘where clause’

** (myloader:79380): CRITICAL **: 23:59:16.630: Error restoring bc.ohlcv from file bc.ohlcv.0.sql: Unknown column ‘id’ in ‘where clause’
Sun Jun 7 23:59:16 UTC 2020: Removing pause file
Sun Jun 7 23:59:16 UTC 2020: Waiting to pt-osc to rename tables…