Queries for Finding Poorly-Designed MySQL SchemasIf you watched Finding Poorly Designed Schemas and How to Fix Them you witnessed Marcos Albe use some very interesting queries. These queries let you find tables without primary keys, tables with non-integer primary keys, tables that do not use InnoDB, tables and indexes with the most latency, indexes that are 50% larger than the table, find duplicate indexes, and find unused indexes. As promised, they are below.

— Find tables without PK
SELECT t.table_schema,t.table_name,t.engine
FROM information_schema.tables t
JOIN information_schema.columns c
ON t.table_schema=c.table_schema
AND t.table_name=c.table_name
WHERE t.table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’)
AND t.table_type = ‘BASE TABLE’
GROUP BY t.table_schema,t.table_name, t.engine
HAVING SUM(IF(column_key IN (‘PRI’,’UNI’), 1,0)) = 0;

— Find tables with non-integer PK’s
SELECT table_schema, table_name, column_name, data_type, character_maximum_length
FROM information_schema.columns
WHERE column_key IN (‘PRI’,’UNI’)
AND ordinal_position=1
AND data_type NOT IN (‘tinyint’, ‘smallint’, ‘mediumint’, ‘int’, ‘bigint’, ‘timestamp’, ‘datetime’)
AND table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’);

— Find tables not using InnoDB
SELECT t.table_schema,t.table_name,t.engine
FROM information_schema.tables t
WHERE t.table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’)
AND t.engine <> ‘InnoDB’
AND t.table_type = ‘BASE TABLE’;

— Find tables and indexes with the most latency
— Amdhal’s law: the overall performance improvement gained by optimizing a single part of a system, is limited by the fraction of time that the improved part is actually used
SELECT *
FROM sys.schema_table_statistics
WHERE table_schema=’test’
AND table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’);

SELECT *
FROM sys.schema_index_statistics
WHERE table_schema=’test’
AND table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’);

— Find tables whose indexes > data by 50%
SELECT table_schema, table_name, index_length, data_length, index_length/data_length AS index_to_data_ratio
FROM information_schema.tables
WHERE table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’)
AND INDEX_LENGTH > DATA_LENGTH*1.5;

— Find tables with duplicate indexes
SELECT table_schema,table_name,redundant_index_name AS redundant_index, redundant_index_columns AS redundant_columns, dominant_index_name AS covered_by_index,sql_drop_index
FROM sys.schema_redundant_indexes
WHERE table_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’);

— Find unused indexes
SELECT *
FROM sys.schema_unused_indexes
WHERE object_schema NOT IN (‘mysql’, ‘information_schema’, ‘sys’, ‘performance_schema’);

Subscribe
Notify of
guest

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Hector

Really nice post, only to comment: with unused indexes you will need take a multiple reads of this table before that you decide delete an index.

Bill Karwin

These may be “code smells” but we should always use our judgment because there are legitimate cases that are an exception to the rule.

It’s not unusual in my experience that some tables have a lot of indexes, sometimes these are bulky and can add up to even greater than 100% of the size of the table.

Duplicate indexes and unused indexes is a bit complicated if you consider unique keys. A unique key may be needed even if it counts as a duplicate or an unused index by your queries.

On the unused indexes query, keep in mind this is reset if MySQL Server restarts. Some indexes are used infrequently, for example for monthly reports. So they could be identified as “unused” even though they are needed. One should keep notes in a wiki or something to document why each index was created, and identify the code and query that needs them.

I’d add to the list of schema design code smells the use of FLOAT or DOUBLE data type. Often it should be DECIMAL instead, especially for storing currency values.

gggeek

It would probably be a good idea to specify which versions of MySQL and MariaDB support the above queries, and/or if they enabling of some specific features (is per. schema always available and populated?)

As an example, sys.schema_unused_indexes is only available in MariaDb starting from 10.6…