mysql IN query problem

Question

select docid  from A  where  docid IN ( select distinct(docid) from B)

When I execute above query in mysql it takes 33 seconds, which is too long as per the size of data.

Below is the details of both tables.

   Table A :
   | docid       | int(11)  | NO   | PRI | NULL    |       |
   Total number of entries = 500 (all entries are unique)

   Table B:
   | docid       | int(11)  | YES  |     | NULL    |       |
   Total number of entries = 66508
   (number of unique entries are 500)

   mysql version : 5.2

If I execute only select docid from A it will take 0.00 seconds, while select docid from B is taking 0.07 seconds.

Then why IN query with subquery takes 33 seconds? Am I doing something wrong?

I am expecting this query should be executed in a second. then why this is taking too much time? — Kunal
– Kunal, Commented Aug 12, 2011 at 11:43
desc select docid from A where docid IN ( select distinct(docid) from B); -- the overhead is because of the number rows require to scan in order to match the IN() — ajreal
– ajreal, Commented Aug 12, 2011 at 11:58

Bohemian · Accepted Answer · 2011-08-12 11:48:04Z

6

The IN list is very large - 60K entries. You would be better to use a join:

select A.docid -- edited - I left out the A. :(
from A
join B on B.docid = A.docid;

This should execute very quickly and will give you the same result as your "IN" query.

edited Aug 12, 2011 at 11:48

answered Aug 12, 2011 at 11:33

Bohemian♦

427k103 gold badges604 silver badges750 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ajreal Over a year ago

you need to put in the alias for select docid like select A.docid

Kunal Over a year ago

Bohenian, i have edited my question because i used distinct in that query. when i execute "select distinct(docid) from B " it is taking 0.07 seconds only, then why it takes 33 seconds with IN query?

eimaj Over a year ago

... because the separate queries can use the primary key index to find the matching rows but the subquery might get executed like a "for loop" where every matching row in the subquery from B is matched against A. The JOIN is more effectively optimised in MySQL at the moment, e.g. see technocation.org/content/oursql-episode-29%3A-subpar-subqueries and the MySQL manual dev.mysql.com/doc/refman/5.5/en/optimizing-subqueries.html

Brian · Accepted Answer · 2011-08-12 11:57:58Z

4

MySQL doesn't handle IN (subquery) well. It executes the inner query every single time the outer query is evaluated, rather than "remembering" the results.

Hence you are much better doing a join.

Other RDBMSes don't do this btw.

answered Aug 12, 2011 at 11:57

Brian

6,4483 gold badges35 silver badges51 bronze badges

1 Comment

Kunal Over a year ago

Thanks Brian, to give me a this much of clarity.

Collectives™ on Stack Overflow

mysql IN query problem

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related