Friday, October 22, 2021

Does Column Order Matter in MySQL Multi Column Indexes

    Multi column indexes are a powerful way to speed up queries but they are often misunderstood.  In most other databases an index on columns a, b, and c can only be used when searching on columns (a,b,& c), (a & b), and (a)  -- according to the manual. Also that index supposedly can not be used to search for (b & c) or just (c).  Well, that is the way I learned it and the way I have been teaching it. But I was wrong!  Now would be a good time to read the MySQL manual on Multiple-Column Indexes as it does not work as noted (or see the excerpt below) and I assumed MySQL worked the same way as the other databases. Well, it doesn't!

Doubt me?  Well, lets create table and add in some data. 

Table and Data

SQL > create table abcd (a serial auto_increment primary key, b int, c int, d int);

Query OK, 0 rows affected (0.0404 sec)

 SQL > insert into abcd values (null,1,2,3),(null,4,5,6),(null,7,8,9);

Query OK, 3 rows affected (0.0081 sec)

Records: 3  Duplicates: 0  Warnings: 0

SQL > select * from abcd;

+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 1 | 2 | 3 |
| 2 | 4 | 5 | 6 |
| 3 | 7 | 8 | 9 |
+---+---+---+---+

3 rows in set (0.0006 sec)

And then we need the index

SQL > create index bcd_index on abcd(b,c,d);


Testing 

    The first test we use the data from the first row where we look for the three columns (b,c,d) in the order specified in the creation of the index.  And guess what? It works as expected and uses the bcd_index.

SQL > explain format=tree select * from abcd where b=1 and c=2 and d=3\G
*************************** 1. row ***************************
EXPLAIN: -> Covering index lookup on abcd using bcd_index (b=1, c=2, d=3)  (cost=0.35 rows=1)

1 row in set (0.0006 sec)


    Leaving on the last column, searching on (b,c) also works as expected.

SQL > explain format=tree select * from abcd where b=1 and c=2\G
*************************** 1. row ***************************
EXPLAIN: -> Covering index lookup on abcd using bcd_index (b=1, c=2)  (cost=0.35 rows=1)

1 row in set (0.0008 sec)

    As does searching on just the first column (b)

SQL > explain format=tree select * from abcd where b=1\G
*************************** 1. row ***************************
EXPLAIN: -> Covering index lookup on abcd using bcd_index (b=1)  (cost=0.35 rows=1)

1 row in set (0.0006 sec)


    But what if we skip the (c) column, the one in the middle?  Well, I had thought that since (b,c) was not part of (b,c,d) or (b,c) or (b) as defined in the index then it could not use the index.

SQL > explain format=tree select * from abcd where b=1 and d=3\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (abcd.d = 3)  (cost=0.28 rows=0)
    -> Covering index lookup on abcd using bcd_index (b=1)  (cost=0.28 rows=1)

1 row in set (0.0007 sec)

    Well, I thought, maybe TREE format from EXPLAIN was not giving me enough data. So rerun EXPLAIN without TREE.

SQL > explain  select * from abcd where b=1 and d=3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: abcd
   partitions: NULL
         type: ref
possible_keys: bcd_index
          key: bcd_index
      key_len: 5
          ref: const
         rows: 1
     filtered: 33.333335876464844
        Extra: Using where; Using index

1 row in set, 1 warning (0.0008 sec)
Note (code 1003): /* select#1 */ select `fk`.`abcd`.`a` AS `a`,`fk`.`abcd`.`b` AS `b`,`fk`.`abcd`.`c` AS `c`,`fk`.`abcd`.`d` AS `d` from `fk`.`abcd` where ((`fk`.`abcd`.`d` = 3) and (`fk`.`abcd`.`b` = 1))


    Okay, I know that on a (b,c,d) index that it is not supposed to work with a (d) search. Boy, was I wrong.

SQL > explain format=tree select * from abcd where d=3\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (abcd.d = 3)  (cost=1.30 rows=1)
    -> Index scan on abcd using bcd_index  (cost=1.30 rows=3)

1 row in set (0.0012 sec)

But the Manual!?!

The manual states:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index.

Well shoot!  This is a case when there is a difference between what the manual says and what the results are telling me.  Frankly, I find this pretty cool even if it makes me rethink the way I create indexes.   MySQL  is more flexible which gives you better performance