0

I am hitting some performance issues on a Mysql server.

I am trying to query a large table (~500k rows) for a subset of data:

SELECT * FROM `my_table` WHERE `subset_id` = id_value;

This request takes ~80ms to achieve, but I am trying to query it over 20k "id_value", which makes the total execution time of almost 1h. I was hopping that adding an index on subset_id would help, but it's not changing anything (understanding how indexes work, it makes sense).

What I am trying to figure out is if there is any way to "index" the table in a way it wouldn't take 80ms to execute this query but something more reasonable? Or in other work, is ~80ms for querying a 500k rows table "normal"?

Note: On the larger picture, I am using parallel queries and multiple connections to speed up the process, and tried various optimizations changing the innodb_buffer size. I'm also considering using a larger object querying the db once for the 500k rows instead of 20k*xx but having my code designed in a multiprocessed/co-routines/scalable way, I was trying to avoid this and focusing on optimizing the query/mysql server at the lowest level.

Thanks!

10
  • As a note, adding "ORDER BY subset_id" seems to speed up the query by 4x Commented Dec 14, 2018 at 14:04
  • "Premature optimization is the root of all evil." Commented Dec 14, 2018 at 15:31
  • @jchevali how is that relevant? Commented Dec 14, 2018 at 15:36
  • You be the judge of what's relevant. It's your thread. Commented Dec 14, 2018 at 16:00
  • @jchevali I am genuinely wondering what made you think of early optimizations Commented Dec 14, 2018 at 16:30

1 Answer 1

1

Use a single query with IN rather than a zillion queries:

SELECT *
FROM `my_table`
WHERE `subset_id` IN (id1, id2, . . .);

If your ids are already in a table -- or you can put them in one -- then use a table instead. You can still use IN

SELECT *
FROM `my_table`
WHERE `subset_id` IN (SELECT id FROM idtable);
Sign up to request clarification or add additional context in comments.

5 Comments

Would also suggest not to use select *, but to include columns explicitly
Thanks, that's a great suggestion and even though not exactly answering my question this is how I think it is going to be implemented eventually.
@Gauravsa thanks for the suggestion, but doesn't seem to improve performances
Don't use the IN ( SELECT ... ); use a JOIN instead.
Sometimes it can make things worse as simple queries with single id would be cached while more complicated queries with list of random IN(ids) will not be cached (no repeating queries with the same ids, result too bit etc). I do not know exact situation to be sure.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.