More efficient query than NOT IN (nested select)

Question

I have two tables table1 and table2 their definitions are:

CREATE `table1` (
    'table1_id' int(11) NOT NULL AUTO_INCREMENT,
    'table1_name' VARCHAR(256),
     PRIMARY KEY ('table1_id')
)

CREATE `table2` (
    'table2_id' int(11) NOT NULL AUTO_INCREMENT,
    'table1_id' int(11) NOT NULL,
    'table1_name' VARCHAR(256),
     PRIMARY KEY ('table2_id'),
     FOREIGN KEY ('table1_id') REFERENCES 'table1' ('table1_id')
)

I want to know the number of rows in table1 that are NOT referenced in table2, that can be done with:

SELECT COUNT(t1.table1_id) FROM table1 t1 
WHERE t1.table1_id NOT IN (SELECT t2.table1_id FROM table2 t2)

Is there a more efficient way of performing this query?

Bill Karwin · Accepted Answer · 2014-09-09 22:58:54Z

3

Upgrade to MySQL 5.6, which optimizes semi-joins against subqueries better.

See http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

Or else use an exclusion join:

SELECT COUNT(t1.table1_id) FROM table1 t1 
LEFT OUTER JOIN table2 t2 USING (table1_id)
WHERE t2.table1_id IS NULL

Also, make sure table2.table1_id has an index on it.

answered Sep 9, 2014 at 22:58

Bill Karwin

567k87 gold badges711 silver badges872 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

A.O. Over a year ago

So after reading through the link you provided, NOT IN resorts to Materialization or Exists, in my case Exists....so should I use the exclusion JOIN or the NOT EXISTS strategy?

Bill Karwin Over a year ago

I'd stick with the exclusion join, because the materialization creates a temporary table. You should learn to use EXPLAIN to examine the optimization plan so you can test both query forms yourself.

John Ruddell · Accepted Answer · 2014-09-10 01:19:18Z

3

try using EXISTS.. its generally more efficient than IN

SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id <=> t1.table1_id
)

you can do it with NOT EXISTS as well

SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE NOT EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id = t1.table1_id
)

EXISTS is generally faster because the execution plan is once it finds a hit, it will quit searching since the condition has proved true. The problem with IN is it will collect all the results from the subquery before further processing... and that takes longer

As @billkarwin noted in the comments EXISTS is using a dependent subquery.. Here is the explain on my two queries and also the OP's query.. http://sqlfiddle.com/#!2/53199d/5

edited Sep 10, 2014 at 1:19

answered Sep 9, 2014 at 22:57

John Ruddell

25.9k7 gold badges60 silver badges88 bronze badges

7 Comments

Bill Karwin Over a year ago

Your example shows a correlated subquery, which will be executed for each distinct value in the outer query. Try it with EXPLAIN, it shows DEPENDENT SUBQUERY.

John Ruddell Over a year ago

@BillKarwin hmm interesting.. so how is EXISTS faster than IN()?

wildplasser Over a year ago

@BillKarwin Only in mysql. Normal brands of SQL handle NOT EXISTS as expected.

Bill Karwin Over a year ago

The OP's question was tagged mysql so I assume they want to know about MySQL behavior. Besides, can you name any brand of SQL database that can optimize out a correlated subquery? I'm not arguing against the possibility, I've just never encountered one.

Martin Smith Over a year ago

@BillKarwin exists and in get exactly the same plan in Sql server with a semi join operator. Similarly with not exists and not in get an anti semi join operator (though not in can get additional baggage if either column is nullable). Generally SQL Server can de-correlate sub queries in a number of cases

|

Collectives™ on Stack Overflow

More efficient query than NOT IN (nested select)

2 Answers 2

2 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related