3

I have following query on mysql, query logic is correct however issue is since there are over 10,000+ seeker emails and 24,000+ guest emails it take long time to execute, is there a better way to do this same this ?

SELECT g.email, g.name
FROM   guest g
WHERE  g.type='guest' 
AND g.email NOT IN (SELECT email FROM seeker GROUP BY email)
GROUP  BY g.email
5
  • Is seeker properly indexed by email? SELECT DISTINCT email might help as well, if there are many duplicate emails in seeker. Commented Sep 9, 2013 at 3:10
  • at the moment i have only primary key indexes, may be i should add a index to email as well and try again, row count is as i said guest have over 24,000+ and seekers has over 10,000+ .. i will edit the question with a explain results Commented Sep 9, 2013 at 3:11
  • 1
    Regarding your SELECT list: MySQL lets you do this, but it shouldn't. Only include non-aggregated columns in the SELECT list that are also in GROUP BY. Commented Sep 9, 2013 at 3:20
  • Using GROUP with not all non-aggregated column will remove entries from your list that you may be want...See SQLFIDDLE Commented Sep 9, 2013 at 4:14
  • please review mysql docs on the performance hit of a sub-select. I have only ever used them in a dire corner and the examples using a left join should be considered carefully. Also fully index the fields you want to put in your where and join clauses. When you do an explain on the query, you want to minimize "temporary" and "file" steps. Commented Sep 9, 2013 at 18:39

5 Answers 5

3

Try this:

SELECT
    g.email, g.name
FROM
    guest g
LEFT JOIN
    seeker s
ON
    s.email = g.email
WHERE
    g.type = 'guest'
AND
    s.email IS NULL
GROUP BY
    g.email;

http://sqlfiddle.com/#!2/d94bf/5

Sign up to request clarification or add additional context in comments.

4 Comments

doesn't this assume that you only null s.email is desired? I presume the OP wants any value where s.email not in g.email
Its the same thing. Its a little trick with LEFT JOIN. This format is faster than using a nested SELECT.
thank you...I was pondering that for a while yesterday, the result of s.email would be unset if the result of join condition was not satisfied. And that could perform a LOT faster than a sub-select.
This is amazingly faster than subselect. Wow.
1
SELECT DISTINCT g.email, g.name
FROM   guest g
WHERE  g.type='guest' 
AND NOT EXISTS (SELECT 1 FROM seeker s WHERE g.email = s.email)

And be sure you have an index on seeker.email, guest.type, guest.email, which would be awesome if the columns are NOT NULL on top of that.

6 Comments

According to the MySQL Optmizer manual, this query format is what already happens under the hood when you do a NOT IN (SELECT).
you mean, the not in transforms into a not exists?
Yes, something like that. I read about that just the other day in a similar question.
i'm skeptical. not in (null) is hardly understood by mysql as a not exist (null)
i think this query doenst do what i want , i get different result (row count) for this " SELECT g.email, g.name FROM guest g WHERE g.type = 'guest' AND g.email NOT IN ( SELECT email FROM seeker ) GROUP BY g.email "
|
0
SELECT DISTINCT g.email, g.name
FROM   guest g
LEFT OUTER seeker s ON s.email = g.email 
WHERE  g.type='guest' AND s.email IS NULL

2 Comments

DISTINCT may have a different result. While GROUP BY g.email will ensure there are no repeated e-mails, DISTINCT will only ensure there are no repeated e-mail and name combinations. sqlfiddle.com/#!2/d94bf/3
@Havenard I suspect that OP don't want to do grouping since there is no aggregate function in the SELECT. Just take a look at is sub select...I think he used GROUP but want the DISTINCT effect :) GROUP will remove data from the list
0

You don't need to group by inner query. You can add DISTINCT instead.

SELECT g.email, g.name
FROM   guest g
WHERE  g.type='guest' 
       AND g.email NOT IN (SELECT DISTINCT email FROM seeker)
GROUP  BY g.email

even this will work

SELECT g.email, g.name
FROM   guest g left outer join seeker s on g.email = s.email
WHERE  g.type='guest' 
       AND s.email is null
GROUP  BY g.email

There will be a lot of string comparisons in your query, it'd help if you index email in your tables esp. seeker.


Also, avoid using SELECT columns that are non-aggregated and not present in GROUP BY. The result is indeterminate.

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

More in manual.

5 Comments

@Havenard i think result of SELECT email FROM seeker GROUP BY email when used in IN will produce same effect as SELECT distinct email FROM seeker got any examples/explanation?
@Havenard look at the query this way - select every guest email which is not in seeker. the group by on outer query will still produce same result. Also, OP should't be using group by the way it is shown leaving the result to be implementation dependent.
Oh, sorry I guess I just misplaced my comment. It was ment to Fabien answer.
A join is not necessarily a replacement for a IN query. The result might be totally different.
@a_horse_with_no_name agreed - it may generate duplicates, but it can be a replacement for NOT IN as shown in this case.
0

First, for your query, you don't need the group by:

SELECT g.email, g.name
FROM   guest g
WHERE  g.type = 'guest' AND g.email NOT IN (SELECT email FROM seeker)
GROUP  BY g.email

That might be sufficient. With an index on seeker(email), the following should optimize ok:

SELECT g.email, g.name
FROM   guest g
WHERE  g.type = 'guest' AND
       not exists (SELECT 1 FROM seeker where seeker.email = g.email)
GROUP  BY g.email;

If you have lots of duplicates in most tables for email, then I wouldn't recommend a left join approach.

3 Comments

i think these two query are not equivalent for the first one i get rows 16090 while second one is 16810 ...
It seems you have a typing error here "AND not exists NOT IN...". NOT IN should be removed.
@igr . . . I did. Thank you for noting it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.