How to Check Multiple Tables Are Empty in SQL

2020-06-08 Look Out! Empty Tables Ahoy!

I encountered this problem recently for a Django ticket for speeding up part of Django’s test runner. We have a list of tables to truncate. How can we figure out which are already empty, so we can skip issuing a somewhat costly TRUNCATE TABLE on them? You might think the database could skip TRUNCATE TABLE on empty tables, but it normally does work beyond removing the rows such as resetting auto-increment counters. In MySQL, TRUNCATE TABLE is even equivalent to DROP TABLE and CREATE TABLE, which requires removing and creating the table file on disk.

You might think the easiest way to find out which tables are empty would be to count the rows in each table:

mariadb [1]> SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
|      123 |
+----------+
1 row in set (0.002 sec)

mariadb [2]> SELECT COUNT(*) FROM table2;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.001 sec)

This works, but it’s slow. Database systems don’t keep a counter of the number of rows in each table, because the overhead to keep it consistent with transactions would be too high. Therefore a COUNT(*) means touching every individual row, obtaining read locks on each of them.

Since we only care about seeing if the table is empty, if we find at least one row we know it’s not. Therefore we can instead query for a single row:

mariadb [3]> SELECT id FROM table1 LIMIT 1;
+------+
| id   |
+------+
| 1234 |
+------+
1 row in set (0.000 sec)

mariadb [4]> SELECT id FROM table2 LIMIT 1;
Empty set (0.000 sec)

This query returns an empty result set if the table is empty, or a single ID an arbitrary row otherwise.

We don’t care about the ID itself. We can therefore select a static value instead:

mariadb [5]> SELECT 1 as non_empty FROM table1 LIMIT 1;
+-----------+
| non_empty |
+-----------+
|         1 |
+-----------+
1 row in set (0.001 sec)

mariadb [6]> SELECT 1 as non_empty FROM table2 LIMIT 1;
Empty set (0.000 sec)

This query again returns an empty result set if the table is empty, or 1 if not.

We can use this technique against multiple tables at once using SQL’s UNION clause. By selecting the table names as our static values, under the same column name, the combined result set will be a list of the non-empty tables:

mariadb [7]> (SELECT 'table1' as non_empty_tables FROM table1 LIMIT 1)
    -> UNION (SELECT 'table2' as non_empty_tables FROM table2 LIMIT 1);
+------------------+
| non_empty_tables |
+------------------+
| table1           |
+------------------+
1 row in set (0.000 sec)

You can UNION many such queries together - databases tend to have a limit on query length or opened tables, but it’s quite high.

And this is the fastest way I can think of for checking which tables in a list are empty, accurately.

What about INFORMATION_SCHEMA?

I should mention INFORMATION_SCHEMA.TABLES, the information schema metadata table-of-tables. In theory this is the right place to ask meta-level questions about your tables, including which ones are empty.

On MariaDB/MySQL, this includes a table_rows column that you could use to check which tables are empty. Unfortunately it’s only an estimate, so you might get false positives or negatives. Other databases have similar estimates, such as SQL Server’s sys.partitions.rows, but I believe they all carry risks of false positives and negatives.

Fin

Hope this helps,

—Adam

Read my book Boost Your Git DX to Git better.

One summary email a week, no spam, I pinky promise.

Tags: django, sql