Unused indexes warrant an evaluation for several reasons. It can be a red flag that the MySQL Optimizer is not choosing to use the index, meaning that the index is not worth the effort to use. Additionally, unused indexes can take up valuable disk space.
In MySQL 5.7.7, the sys schema was introduced to make statistics gathered from the performance_schema and information_schema more readable. It is especially helpful for identifying unused indexes.
The following command will create a list of unused indexed:
mysql> select * from sys.schema_unused_indexes;
The output will look something like this:
mysql> select * from schema_unused_indexes;
+-----------------+-------------------+----------------------------+
| object_schema | object_name | index_name |
+-----------------+-------------------+----------------------------+
| database | tbl_1 | idx_1 |
| database | tbl_2 | idx_2 |
+-----------------+-------------------+----------------------------+
But now what? If an index is unused, do we just drop it? Not necessarily. More analysis needs to be done to make sure that an unused index is truly not useful.
There are several things to consider. Below are a few questions to ask yourself.
- What is your uptime? Is the index unused because it just hasn’t been used yet but will?
- Is there a forgeign key (FK) constraint? If so, should there be one?
- How useful is the index? Is it selective or will the optimizer decide that the cost of using the index is not worth it?
In this blog, I will go through each consideration. Using these steps can help identify indexes that are good candidates for removal and should be investigated further.
Uptime
The index status “unused” can be somewhat misleading. Query statistics are cleared every time the MySQL instance is rebooted thus the index is “unused” since the last reboot. If you run the query for unused indexes right after a reboot, there would be more indexes in the output than if you ran it a few months later because applications are given more time to query the instance and use indexes. The longer the uptime, the more confident you can be in the unused index output.
If rebooting the MySQL server is part of routine maintenance, it is best to run the command for the unused index output as close to scheduled maintenance as possible.
FK Constraint
An FK is a column with unique keys that reference the primary key of another table. FK constraints prevent alterations between tables that would corrupt the link between the FK and referenced primary key.
To find out if a column has an FK constraint, we have to look at the table definition. When conducting unused index analysis, it is good to start with the ones from the largest tables.
The table definition can be derived using the command below.
mysql> show create table tbl_1;
The FK constraint will look something like the output below. It will start with the word ‘CONSTRAINT’ followed by the index name, FK column, and referenced PK on another table.
CONSTRAINT `idx_name` FOREIGN KEY (`column_name`) REFERENCES `another_tbl` (`primary_key`),
If there is an FK constraint on the index, do not drop the index. This doesn’t mean that there should be an FK constraint or the query doesn’t need tuning. It just means that dropping the index will throw an error.
Selectivity
Selectivity relates to the specificity of an index. The more distinct the values are in the index columns, the higher the selectivity. Selectivity closest to 1.00 is ideal.
The equation below shows the relationship between distinct values and selectivity.
Selectivity = distinct values/ total number of rows
Selectivity closer to 1.00 is best.
The query below can be used to get the number of distinct values and total rows.
mysql> select count(distinct idx_column) "DISTINCT KEYS", count(idx_column) "TOTAL NUM ROWS" from tbl1;
+---------------+----------------+
| DISTINCT KEYS | TOTAL NUM ROWS |
+---------------+----------------+
| 669430 | 2581845 |
+---------------+----------------+
Selectivity for the given example is 0.26 which is not considered a useful index.
Selectivity Defined Differently
You might get conflicting information on high or low selectivity being ideal. I know I did. I realized that there were two methods of defining selectivity. An alternative method being used was calculating the ratio of returned rows using the index to the total number of rows in the table (not the same as the number of rows scanned).
Alternative Selectivity Equation:
Selectivity = returned rows using index/ total number of rows
Selectivity < 0.20 is ideal
To get these numbers, a query is needed for profiling. (Profiling is a whole other topic that is also worth researching.) If thinking about selectivity in this way, a ratio of less than 0.20 is ideal.
Following the methods in this blog is the first step to deciding if an index should be dropped. There is still work to be done before actually dropping the index, especially on a production server. I recommend testing the impact of removing an index in a dev or replica server before dropping an index on a production server. Furthermore, MySQL 8.0 allows you to test the removal of an index without actually dropping it. For more information on invisible indexes, check out the MySQL documentation.
Please contact us for any questions you may have!