At the end of 2021, I pushed the first Docker image to hub.docker.com. This was the first official image and since then, we have been improving our testing and packaging procedures based on Docker, CircleCI, and GitHub Actions. However, when I’m coding, I’m not testing in Docker. But a couple of weeks ago, when I was reviewing an issue, I realized some interesting Docker use cases that I want to share.
Common use case
First, we are going to review how to take a simple backup with MyDumper to warm you up:
1 2 3 4 5 6 7 8 9 10 11 | docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 1000 -L /backups/mydumper.log" |
You will find the backup files and the log on ${backups}. Then you can restore it using:
1 2 3 4 5 6 7 8 9 10 | docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "myloader -h 172.17.0.4 -d /backups/data -B test -v 3 -o -L /backups/myloader.log" |
And if you want to do it faster, you can do it all at once:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 1000 -L /backups/mydumper.log ; myloader -h 172.17.0.4 -d /backups/data -B test -v 3 -o -L /backups/myloader.log" |
We can remove the option to mount a volume (-v ${backups}:/backups), as the data will reside inside the container.
Advance use case
Since version 0.14.4-7, I created the Docker image with ZSTD instead of GZIP because it is faster. Other options that are always useful are –rows/-r and –chunk-filesize/-F. On the latest releases, you can run ‘100:1000:0’ for -r, which means:
- 100 as the minimal chunk size
- 1000 will be the starting point
- 0 means that there won’t be a maximum limit
And in this case, where we want small files to be sent to myloader as soon as possible, and because we don’t care about the number of files either, -F will be set to 1.
In the next use case, we are going to stream the backup through the stdout from mydumper to myloader, streaming the content without sharing the backup dir:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 100:1000:0 -L /backups/mydumper.log -F 1 --stream -c | myloader -h 172.17.0.4 -d /backups/data_tmp -B test -v 3 -o -L /backups/myloader.log --stream" |
In this case, backup files will be created on /backups/data, sent through the pipeline, and stored on /backups/data_tmp until myloader imports that backup file, and then it will remove it.
To optimize this procedure, now, we can share the backup directory setting –stream to NO_STREAM_AND_NO_DELETE, which is not going to stream the content of the file but is going to stream the filename, and it will not delete it as we want the file to be shared to myloader:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 100:1000:0 -L /backups/mydumper.log -F 1 --stream=NO_STREAM_AND_NO_DELETE -c | myloader -h 172.17.0.4 -d /backups/data -B test -v 3 -o -L /backups/myloader.log --stream" |
As you can see, the directory is the same. Myloader will delete the files after importing them, but if you want to keep the backup files, you should use –stream=NO_DELETE.
The performance gain will vary depending on the database size and number of tables. This can also be combined with another MyDumper feature, masquerade your backups, which allows you to build safer QA/Testing environments.
Conclusion
MyDumper, which already has proven that it is the fastest logical backup solution, now offers a simple and powerful way to migrate data in a dockerized environment.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
for large tables (e.g. > 1TB) how does this perform? How much disk space would you recommend I allocate to docker in such a case? twice the size of the largest table?
Eric, you need to consider several things:
As you can see, there are so many things to take into consideration, I think that the best is to start with half of the size of the database and reduce when you get a better idea of the space used.
Specified with –directory (-d), myloader terminated with error:
[ERROR] - Backup directory (-d) must not exist when --stream / --stream=TRADITIONAL
,How to dump.& load in stream without writing files twice for mydumper –stream=NO_STREAM_AND_NO_DELETE with myloader –stream ?
According to the latest code, I found myloader support –stream=NO_STREAM, which is missing in help, but it really works with
mydumper --stream=NO_STREAM_AND_NO_DELETE | myloader --stream=NO_STREAM