3

I have a MySQL table with just under 200 million rows. I have a Django query that looks like this:

foo.objects.filter(field1_id__in=[about 27 items], 
                   field2_id__in=[about 25 values], 
                   field3=value)

I noticed that the page that runs this filter is hanging today. The page yesterday rendered in about one second. The field1 list grows over time as more data is added, the field2 list is constant size. Playing with these queries interactively I determined that there is a cliff where if I only specified the first 9 values of the field2 "in," the query returns in about a second, but if I move to 10 values in the field2 list, the query hangs "forever."

Does such a severe degradation make sense? There are no joins and no dependent queries, just a WHERE with 3 clauses ANDed together, two of them being the INs. Feels like a MySQL bug...? Or "that's just life with MySQL?"

Edit: the 10-item field2_id__in query just returned: took about 45 minutes!

Raw Query

SELECT `mytable`.`id`, `mytable`.`field1_id`, `mytable`.`field2_id`, `mytable`.`field3_id`, `mytable`.`field4_id`, `mytable`.`data` FROM `mytable` WHERE (`mytable`.`field2_id` IN (44942, 42953, 43099, 43330, 45165, 45468, 43518, 45620, 43693, 45760, 43790, 45930, 43885, 46026, 46120, 44158, 46298, 44314, 42204, 46492, 44441, 42327, 44586, 42515, 44726, 44835, 42802) AND `mytable`.`field3_id` IN (3, 17, 696, 150, 170, 51, 6528, 2383, 3342, 2289, 6491, 6375,2070, 6186, 318, 6498, 5197, 6011, 5833, 7803, 5195, 4871, 6928, 6531) AND `mytable`.`field4_id` = 11 )

Explain Output

select_type: simple
type: range
possible_keys: (3 keys)
key: (key)
key_len: 4
ref: NULL
rows: 14160 
extra: Using index condition; Using where
4
  • 2
    Two comments: First, have you tried running the various versions of the query directly from MySQL? Second, have you tried using EXPLAIN to see what the execution plan is? Commented Oct 22, 2015 at 1:26
  • What is the data_type of field_2? Commented Oct 22, 2015 at 1:27
  • I've done neither. explain sounds like a good idea. both field1 and field2 are foreign keys, but the query is on the id so it's simple integer compares. Commented Oct 22, 2015 at 1:27
  • Yes the query hangs when executed directly in MySQL. I ran the explain but don't know how to interpret the results. select_type: simple, type: range, possible_keys: (3 keys), key: (key), key_len: 4, ref: NULL, rows: 14160 extra: Using index condition; Using where Commented Oct 22, 2015 at 1:39

1 Answer 1

3

It looks like all 3 fields are foreign keys on the foo table. Only one index can be used so add an index which includes all 3 fields to your model so it is used.

class Meta:
    index_together = ["field1", "field2", "field3"]

Write performance will take a small hit but at least you will be able to query your data. You do not need an index for every combination, in the index I provided above a query on all fields, only field1 or (field1 and field2) will use the index (because all fields from left to right are used and MySql can just ignore the rest of the index). Personally, I have never seen write performance suffer so much that I regret putting several indexes as needed on a table. It will take several hours to add the index to 200 million rows.

Note that django automatically generates indexes for ForeignKey fields or joins would be painfully slow. That's why your explain output says possible_keys: (3 keys), those are likely indexes on field1-3.

Databases do jump off cliffs and with a table of 200 million rows I am not surprised your database did. Indexes are of vital importance to making databases purrr.

Sign up to request clarification or add additional context in comments.

2 Comments

What are the ramifications of adding such an index (size/write performance)? I have many ways of accessing this table, I'm not sure I want to add a ton of special indeces (but will if that is the only option). And why did the performance drop from 1 second to 45 minutes when the size of one of the IN lists moved from 9 to 10 elements?
I will edit the answer because there is too much to write here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.