3

I have two tables table1 and table2 their definitions are:

CREATE `table1` (
    'table1_id' int(11) NOT NULL AUTO_INCREMENT,
    'table1_name' VARCHAR(256),
     PRIMARY KEY ('table1_id')
)

CREATE `table2` (
    'table2_id' int(11) NOT NULL AUTO_INCREMENT,
    'table1_id' int(11) NOT NULL,
    'table1_name' VARCHAR(256),
     PRIMARY KEY ('table2_id'),
     FOREIGN KEY ('table1_id') REFERENCES 'table1' ('table1_id')
)

I want to know the number of rows in table1 that are NOT referenced in table2, that can be done with:

SELECT COUNT(t1.table1_id) FROM table1 t1 
WHERE t1.table1_id NOT IN (SELECT t2.table1_id FROM table2 t2)

Is there a more efficient way of performing this query?

2 Answers 2

3

Upgrade to MySQL 5.6, which optimizes semi-joins against subqueries better.

See http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

Or else use an exclusion join:

SELECT COUNT(t1.table1_id) FROM table1 t1 
LEFT OUTER JOIN table2 t2 USING (table1_id)
WHERE t2.table1_id IS NULL

Also, make sure table2.table1_id has an index on it.

Sign up to request clarification or add additional context in comments.

2 Comments

So after reading through the link you provided, NOT IN resorts to Materialization or Exists, in my case Exists....so should I use the exclusion JOIN or the NOT EXISTS strategy?
I'd stick with the exclusion join, because the materialization creates a temporary table. You should learn to use EXPLAIN to examine the optimization plan so you can test both query forms yourself.
3

try using EXISTS.. its generally more efficient than IN

SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id <=> t1.table1_id
)

you can do it with NOT EXISTS as well

SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE NOT EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id = t1.table1_id
)

EXISTS is generally faster because the execution plan is once it finds a hit, it will quit searching since the condition has proved true. The problem with IN is it will collect all the results from the subquery before further processing... and that takes longer

As @billkarwin noted in the comments EXISTS is using a dependent subquery.. Here is the explain on my two queries and also the OP's query.. http://sqlfiddle.com/#!2/53199d/5

7 Comments

Your example shows a correlated subquery, which will be executed for each distinct value in the outer query. Try it with EXPLAIN, it shows DEPENDENT SUBQUERY.
@BillKarwin hmm interesting.. so how is EXISTS faster than IN()?
@BillKarwin Only in mysql. Normal brands of SQL handle NOT EXISTS as expected.
The OP's question was tagged mysql so I assume they want to know about MySQL behavior. Besides, can you name any brand of SQL database that can optimize out a correlated subquery? I'm not arguing against the possibility, I've just never encountered one.
@BillKarwin exists and in get exactly the same plan in Sql server with a semi join operator. Similarly with not exists and not in get an anti semi join operator (though not in can get additional baggage if either column is nullable). Generally SQL Server can de-correlate sub queries in a number of cases
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.