10.7 preview feature: InnoDB Bulk Insert

Everyone wants performance. Speeding up the insert into empty InnoDB tables and partitions is a common enough use case to optimize. Thirunarayanan Balathandayuthapani from the MariaDB Corporation wrote this improvement to the bulk inserting into InnoDB tables that is available as a preview for testing.

To achieve this, if the table is empty then InnoDB will pre-sort the records for each index, and build the indexes one page at a time. If the transaction does bulk insert operation then InnoDB should create buffer of size innodb_sort_buffer_size for each index.

If the buffer ran out of memory then InnoDB will create a temporary file for storing the data. Once transaction statement gets committed then InnoDB should sort the buffer/file blocks and do bulk insert operation for each index.

In case of externally stored field, InnoDB bulk insert operation will store the data in temporary created blob file. Criteria to store the field data in blob file is more than 2000 bytes.

A current limitation is that the bulk insert is from a single large INSERT … VALUES SQL, or a LOAD DATA SQL.
For improving load performance disabling foreign_key_checks=0 and unique_checks=0 (in 10.6+ due to MDEV-515) are existing mechanisms.

Example Performance

In the example below, HammerDB has been used to generate a scale 1, TPROCH dataset.

HammerDB Datagen to create tbl files

As the tbl files are one per virtual user when I generated these, they where concatenated so they could be used with LOAD DATA INFILE.

The table schema for these where created with mariadb-dump –no-data tproch from another instance.

These tests where on a basic 8 cpu laptop with 16G RAM, with CPU governor set to performance. To eliminate the performance of the encrypted storage I’ve included innodb_flush_log_at_trx_commit=0 as a mitigation.

Here I’ve compared using this preview, against the 10.7 baseline development version.

$ podman run --rm --detach \
 --env MARIADB_RANDOM_ROOT_PASSWORD=1 \
 --env MARIADB_DATABASE=tproch \
 --env MARIADB_USER=tproch \
 --env MARIADB_PASSWORD=tproch \
 --name mdb107_base \
 -P \
 -v ./:/docker-entrypoint-initdb.d:z \
 quay.io/mariadb-foundation/mariadb-devel:10.7 \
--innodb-buffer-pool-size=6G \
--innodb-log-file-size=4G \
--innodb_flush_log_at_trx_commit=0

The docker-entrypoint-initdb.d contains the tables.sql to create the table on startup.

$ podman port mdb107_base 3306
0.0.0.0:35545

Using the port number above, the following was used to LOAD DATA of the tables in parallel:

$  for t in customer lineitem nation orders part partsupp region supplier; do 
  tbl=${t^^*};
  echo table=$tbl
  time mariadb -h 127.0.0.1 --port 35545  -u tproch -ptproch \
    --local-infile=1 \
    -e "set @@foreign_key_checks=0;
    load data local infile '$t.tbl' into table $tbl FIELDS terminated by '|';" tproch
 done

The following timing results where obtained for each table:

Tag of container:
quay.io/mariadb-foundation/mariadb-devel
customer lineitemnationorderspartpartsuppregionsuppliertotal time
10.70m1.076s1m55.075s0m0.006s0m16.041s0m1.271s0m6.647s0m0.010s0m0.238s140.364s
10.7-mdev-24621-innodb-bulk-insert0m4.175s1m47.870s0m0.006s0m13.670s0m1.191s0m7.718s0m0.006s0m0.261s134.897s
Real time measurements for loading each table

How to try the 10.7 InnoDB Bulk Insert data type preview yourself?

There are two ways to preview this feature:

Tarball

Go to tarball downloads

Container

You can run the container called quay.io/mariadb-foundation/mariadb-devel:10.7-mdev-24621-innodb-bulk-insert as I have performed above.

Feedback Welcome

If you come across any problems in this feature preview, with the design, or especially cases where there is a performance drop, please let us know with a JIRA bug/feature request on the MDEV project. You are welcome to chat about it on Zulip.