pt-archiver Misbehaving With Secondary IndexNot long ago, we found a long-running query in one of our clients on Managed Services. While reviewing this long-running query, we found it was generated by a pt-archiver operation. That doesn’t sound good, right?

I was able to reproduce the issue on my test environment. In this article, I will show you how to use pt-archiver to purge data from a table using a secondary index. First, we need to have the full picture before going any deeper. Here is a representation of the table structure:

As you can see, this test table was generated with sysbench. The criteria defined to perform the purge process was to remove all rows where column “k” is lower than 1,500,000. In my test environment, this represents around 40K rows.

Examining the pt-archiver operation used, we found that the options used were logical according to the archiving criteria and the table structure. These are the pt-archiver options used:

As you can see, it is suggesting the usage of index “k_1”, which references the column used on the purging criteria, and it is also using the option “–primary-key-only” which is suggested if you just want to purge data, which is our case. The option “–primary–key-only” makes pt-archiver include only the primary key columns on the SELECT clause to avoid fetching all columns from the table; this simple feature speeds up the purge process, especially on wide tables.

However, reviewing the sentence executed in the database, I found that the query running was not using the suggested index:

Now, if we get the actual explain plan for this query, we will see it is not optimal at all, as it is fetching almost two million rows and forcing the usage of the “PRIMARY” index. No wonder why this was a long-running query.

But why is pt-archiver doing this? Why isn’t it using the index suggested? After playing around with pt-archiver options, I realized that the usage of option “–primary-key-only” inhibits the usage of the suggested secondary index. This was reported initially on ticket PT-157, and it was marked as fixed. However, we can confirm that the most recent version of pt-archiver 3.4.0 is behaving as expected. So I opened this ticket PT-2112 to follow up on the fix.

Anyways, we need to fix the issue on our client, and to do so, we just need to remove the “–primary-key-only” option, which will allow us to use the suggested index. The downside of doing this is that it will fetch all columns from the table. We can see the SQL statement pt-archiver will generate by specifying the option “–dry-run”; this is a cool feature to see if the statement generated is optimal. 

If we examine the explain plan for this generated statement we can see that the query cost for this new sentence is up to 7X times better than the previous one, and we are fetching only 79K rows compared to the 2M from the previous query. So definitely a better query.

Conclusion

Hopefully, we will see this feature fixed in the future, but for now, we can just use pt-archiver as we did to take advantage of the secondary indexes. Also, as a recommendation, always use the “–dry-run” option to test your pt-archiver command and make sure the query produced is optimal for your database.

Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Fernando Mario

Well done, Mike

I actually think doing things in Primary Key order is the right thing to do. But not the way pt-archiver is doing it currently.

By deleting using the index, there are probably a lot of random PK lookups done in the table, and if deleting more than one row per InnoDB page, it might end-up loading this page many times in the buffer pool, which might waste IOs. Obviously, this will not be the case if you only delete a small percentage of the rows, but when deleting by a secondary index, there is a horrible worse case scenario of loading the table many times.

By deleting things in PK order, the behavior is predictable (always reading the full table once), which is a better worse case than reading the table N times (let’s say 16 times if there are 16 rows to delete in each page and this is done in random order with the table not fitting in RAM –> a 1 TB table might end-up being read 16 times !).

But when deleting things in PK order, we need to be sure things are “bounded”. Below query can read up to 4.000.000 rows (4M) if all rows from id 0 to 4M exist and have k > 1.500.000 (1.5M). This is very close to unbounded.

  • SELECT /*!40001 SQL_NO_CACHE */ id FROM dbtest.sbtest1 FORCE INDEX(PRIMARY) WHERE (k < 1500000) AND (id < ‘4000000’) ORDER BY id LIMIT 1000

The right way of doing this is first extracting a bounded range (100K rows in this case, which is 40 times less than above and should be done in constant time, we start with :start_of_range being 0, then the next one will be the id returned by this query , until it is null, in which case we need a last pass on max(id)):

  • SELECT /*!40001 SQL_NO_CACHE */ id as end_of_range FROM dbtest.sbtest1 FORCE INDEX(PRIMARY) WHERE (id >= :start_of_range) ORDER BY id LIMIT 1 OFFSET 100000

Then in that range, delete all rows with k < 1.5M by doing below many times, deleting 1K row max each time:

  • DELETE FROM dbtest.sbtest1 WHERE (id >= :start_of_range) AND (id < :end_of_range) AND (k < 1500000) LIMIT 1000

And then grabbing the next range. The size of the range should be such as it stays in the buffer pool while the deletes are done.

Note that the described algorithm is very lean in the number of id fetch client-side (only begin and end range), compared to what other queries are doing (fetching large chunks of id).

Also note that the described algorithm can still be improved, because the delete might scan deleted-marked rows many times (in reference to purge being behind). Optimizing this is left as an exercise to the reader (hint: it involves a SELECT before the DELETE and narrowing-down the range as delete progresses).

Luke

I have a similar senario, my concern here is that the SELECT pt-archiver generated can not guarantee the uniqueness and correctness.

SELECT /*!40001 SQL_NO_CACHE */ id,k,c,pad FROM dbtest.sbtest1 FORCE INDEX(k_1) WHERE (k < 1500000) AND ((k >= ?)) ORDER BY k LIMIT 1000

Indeed, if you archive with deletion the original rows, it will be fine, but if you give --no-delete, I suspect the SELECT will archive duplicated rows, e.g. there are 1100 rows with k=100, in the first run, it will pick 1000 rows, with the last k still be 100, in the next run:

SELECT /*!40001 SQL_NO_CACHE */ id,k,c,pad FROM dbtest.sbtest1 FORCE INDEX(k_1) WHERE (k < 1500000) AND ((k >= 100)) ORDER BY k LIMIT 1000

There is no guarantee MySQL will not pick the rows already fetched in the first run.

In fact I think the correct SELECT should be like:

SELECT /*!40001 SQL_NO_CACHE */ id,k,c,pad FROM dbtest.sbtest1 FORCE INDEX(k_1) WHERE (k < 1500000) AND ((k = 100 and id>?) or(k>100) ) ORDER BY k,id LIMIT 1000

The above query can both use the index k_1 and guarantee there is no duplicate rows will be picked.

Last edited 4 months ago by Luke