The Problem
I have a table that's about 2 million rows (at 115 MB) and it's about to be much larger. When running some utility scripts on the table I noticed one of my queries was taking a long time (15+ seconds) when a query that was nearly identical took less than a half second right before. Here are the queries:
Query 1:
SELECT `id` FROM `my_table` WHERE `my_column`='test' ORDER BY `id` LIMIT 28000, 1000
Execution time: 0.204 seconds
Query 2:
SELECT `id` FROM `my_table` WHERE `my_column`='test' ORDER BY `id` LIMIT 29000, 1000
Execution time: 10.203 seconds
Indexing and table info
id is a primary key and my_column is also indexed (although at the moment its cardinality is only 1)
• id is an int
• my_column is a varchar(50)
Queries explained
Query 1: type: index, possible_keys: my_column, key: PRIMARY, key_len: 4, rows: 29,000, Extra: Using where
Query 2: type: range, possible_keys: my_column, key: my_column, key_len: 53, rows: 2,139,123 Extra: Using where; Using filesort
As you can see the 2nd query is using the my_column key and filesort and taking forever, but all I did was increment the limit offset by 1,000.
How I temporarily fixed the problem
1) If I remove the WHERE my_column = 'test' condition the mysql optimizer correctly uses the primary key to sort, but I can't remove this condition because soon enough there will be other values in my_column which I'm going need to filter out for this query.
2) If I use FORCE INDEX (PRIMARY) the mysql optimizer will also use the proper index, but this seems to be sort of a hack.
My question
Why exactly is mysql choosing to use the my_column index instead of the primary key? And is there a better way to handle this either in the table definition, indexes, or my query structure?