0

I'm currently working on the project which has some mysql query. All of mysql query in the project has been done by another developer. Anyway, I'm a bit confusing with the query that he has done below:

SELECT MIN(s_date) AS s_date, 
       client_id
FROM tb1 
WHERE flag = 1 
    AND client_id NOT IN (
        SELECT DISTINCT client_id 
        FROM tb1 
        WHERE flag = 0
    ) 
GROUP BY client_id;

The first part of the query is checking if flag = 1 and the second part is checking NOT IN ( flag = 0). I think it's kind of redundant since the flag = 1, it can't be 0. I don't understand the logic of that query. And also I think that NOT IN is a bit slow (it takes 2 sec for my database).

Please explain me what does it mean by that query and how can I simplify and improve it.

5
  • 1
    Use EXPLAIN to see what indexes your query is using, and then consider adjusting your indexes accordingly Commented Oct 7, 2014 at 16:16
  • 2
    You may also find that using a JOIN query is more efficient than using a subselect query Commented Oct 7, 2014 at 16:16
  • @Mihai Without knowing the actual structure of tb1, I can't say for certain; but ore common practise is to have id as the primary key, and client_id would be a foreign key Commented Oct 7, 2014 at 16:19
  • Is client_id primary key?Explain,table definition would be nice. Commented Oct 7, 2014 at 16:19
  • 2
    @MarkBaker The subselect is not pointless overhead it is used to find all client id's which have any record with flag = 0, not all records with flag=0 Commented Oct 7, 2014 at 16:33

3 Answers 3

1

You seem to summarizingclients where the flag is never 0. The query is more simply written as:

SELECT MIN(s_date) s_date, 
       client_id F
FROM tb1 
WHERE flag in (0, 1)
GROUP BY client_id
HAVING SUM(flag = 0) = 0;

This may also improve performance.

Sign up to request clarification or add additional context in comments.

4 Comments

+1. It's likely that this form will give better performance than a query that uses an anti-join or a NOT EXISTS predicate. With this query, MySQL will likely be able to make effective use of an index ON tbl (client_id, s_date, flag).
awesome, it only takes 0.02 sec to get the same result, cheers :)
but tbh, i don't really understand the code, if you don't mind, please elaborate it. Thanks
The code checks that none of the rows for a given client have a flag of 0. I'm not sure what needs to be explaining. The having clause could also be written: having sum(case when flag = 0 then 1 else 0 end) = 0.
0

In most databases, using "not in" is simple, intuitive, but slow. Sometimes you can solve it like this:

where myfield in 
(select myfield 
where I want it
minus
select myfield 
where I want to exclude it)

Some databases use the word except instead of minus. I don't think that works with mySql, so you have to do something like this:

select somefields
from sometables
left join (
select idfield, someOtherField
from blah 
where I want to exclude it
) temp on sometable on sometable.idfield = temp.idfield
and temp.someOtherField is null

Comments

0

Here is how you think of it:

IN the subselect you are finding the list of client_id's that have at least one record where flag = 0.

You then exclude that list of id's from the main query.

So if you had sample data like this:

client_id   flag    s_date
---------   ----    ------
1           1       2014-01-01
2           0       2014-02-01
2           1       2014-03-01
3           0       2014-04-01
4           1       2014-05-01
4           1       2014-06-01

Your query would only return:

s_date       client_id
------       ---------
2014-01-01   1
2014-05-01   4

Actually in your query, the redundant use of flag is actually in the main query. It's not needed there at all since you have already eliminated all client_id's with any flag=0 values in the subselect.

As far as optimizing the query. This is one of those cases where maybe the subselect is faster than a join and maybe it is not. It really depends on the number of rows of data, the number of those rows that meet the subselect condition, etc. (assuming of course all proper indexing is in place).

You can try a self join like this to see which performs better for you:

SELECT
    MIN(a.s_date) AS s_date,
    a.client_id AS client_id
FROM tbl AS a LEFT JOIN (
    SELECT DISTINCT client_id 
    FROM tb1 
    WHERE flag = 0
) AS b
ON a.client_id = b.client_id
WHERE b.client IS NULL
GROUP BY a.client_id

Also try answer by @GordonLinoff that is another creative option for getting to this same query result

1 Comment

thanks for the explanation, it makes sense now. And your code really works. it only takes 1 sec to get the result. cheers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.