myloader Stops Causing Data FragmentationDuring the development of the myloader –innodb-optimize-keys option, which was released in version 0.10.7, we found several issues and opportunities to improve the process. We had to change the approach, reimplement some of the core functionality and add a couple of data structures. That allowed us to implement, at a really low cost, a feature that executes the files that contain INSERT statements, sorted by Primary Key. This is desirable to reduce page splits, which cause on-disk tablespace fragmentation.

In this blog post, I will present the differences in data fragmentation for each version.

Test Details

These are local vm tests as there is no intention to show performance gain.

The table that I used is:

And I inserted the data with:

The graphs below were made with innodb_ruby (more info about it in this blog post) and based on a table of 131K rows with –rows 100. The intention of this test was to create a lot of files that will cause better spread in the Primary Key. The timings are over the same table structure but the table has 32M rows. Finally, I performed the test with 1 and 4 threads and with –innodb-optimize-keys when possible.

Tests Performed

In myloader v0.10.5 there was no file sorting, which is why we can see that lower Primary Key values were updated recently:

   

It doesn’t matter the number of threads, we can see how pages, across the whole file, are being updated at any time. 

This is happening because mydumper exported the files in order with these min_id and max_id values:

Filemin_idmax_id
test.perf_test.00000.sql121261
test.perf_test.00001.sql2126242522
test.perf_test.00002.sql4252349137
test.perf_test.00003.sql6552185044
test.perf_test.00004.sql8504598288
test.perf_test.00006.sql131056148827
test.perf_test.00007.sql148828170088
test.perf_test.00008.sql170089191349
test.perf_test.00009.sql191350196591
test.perf_test.00012.sql262126276393

But, during import, there was no order, let’s see the log:

With these max_id and max_id per file:

Filemin_idmax_id
test.perf_test.01476.sql3138123731402497
test.perf_test.00087.sql18497081870968
test.perf_test.01484.sql3155132531572585
test.perf_test.01067.sql2268548822706748
test.perf_test.00186.sql39545473975807
test.perf_test.01032.sql2194135321962613

With this kind of insert order, you can only imagine the amount of page splits that cause the fragmentation in the InnoDB datafile.

Timings were:

In v0.10.7-2 we have the same behavior:

 

But we have a small performance increase:

We see the same pattern, even if we use the –innodb-optimize-keys:

 

The main difference is the index creation stage.

Now, in v0.10.9, where we have table and file sorting, the graphs have a significant change: 

 

It is also a bit shocking the difference between the 2 graphs, not about color trending, but about the number of pages used which indicates a high fragmentation when multiple threads are used.

Let’s check now with –innodb-optimize-keys:

 

This is what we are looking for! As you can see with 1 thread is perfect, but with 4 threads there is some odd distribution, but for sure, much better than the other options.

However, the timings are not the best:

Let’s compare them:

But that makes sense if you read this blog post. Actually, it would be really nice to have a feature that determines when –innodb-optimize-keys needs to be used.

Conclusions

Version 0.10.9 of MyDumper will allow myloader to insert better than previous versions. Multithreaded inserts sorted by Primary Key are now possible and faster than ever!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments