How to improve speed of this mysql query processing

Question

I'm currently working on the project which has some mysql query. All of mysql query in the project has been done by another developer. Anyway, I'm a bit confusing with the query that he has done below:

SELECT MIN(s_date) AS s_date, 
       client_id
FROM tb1 
WHERE flag = 1 
    AND client_id NOT IN (
        SELECT DISTINCT client_id 
        FROM tb1 
        WHERE flag = 0
    ) 
GROUP BY client_id;

The first part of the query is checking if flag = 1 and the second part is checking NOT IN ( flag = 0). I think it's kind of redundant since the flag = 1, it can't be 0. I don't understand the logic of that query. And also I think that NOT IN is a bit slow (it takes 2 sec for my database).

Please explain me what does it mean by that query and how can I simplify and improve it.

Use EXPLAIN to see what indexes your query is using, and then consider adjusting your indexes accordingly — Mark Baker
– Mark Baker, Commented Oct 7, 2014 at 16:16
You may also find that using a JOIN query is more efficient than using a subselect query — Mark Baker
– Mark Baker, Commented Oct 7, 2014 at 16:16
@Mihai Without knowing the actual structure of tb1, I can't say for certain; but ore common practise is to have id as the primary key, and client_id would be a foreign key — Mark Baker
– Mark Baker, Commented Oct 7, 2014 at 16:19
Is client_id primary key?Explain,table definition would be nice. — Mihai
– Mihai, Commented Oct 7, 2014 at 16:19
@MarkBaker The subselect is not pointless overhead it is used to find all client id's which have any record with flag = 0, not all records with flag=0 — Mike Brant
– Mike Brant, Commented Oct 7, 2014 at 16:33

Gordon Linoff · Accepted Answer · 2014-10-07 16:22:38Z

1

You seem to summarizingclients where the flag is never 0. The query is more simply written as:

SELECT MIN(s_date) s_date, 
       client_id F
FROM tb1 
WHERE flag in (0, 1)
GROUP BY client_id
HAVING SUM(flag = 0) = 0;

This may also improve performance.

answered Oct 7, 2014 at 16:22

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

spencer7593 Over a year ago

+1. It's likely that this form will give better performance than a query that uses an anti-join or a NOT EXISTS predicate. With this query, MySQL will likely be able to make effective use of an index ON tbl (client_id, s_date, flag).

agthumoe Over a year ago

awesome, it only takes 0.02 sec to get the same result, cheers :)

agthumoe Over a year ago

but tbh, i don't really understand the code, if you don't mind, please elaborate it. Thanks

Gordon Linoff Over a year ago

The code checks that none of the rows for a given client have a flag of 0. I'm not sure what needs to be explaining. The having clause could also be written: having sum(case when flag = 0 then 1 else 0 end) = 0.

Dan Bracuk · Accepted Answer · 2014-10-07 16:23:48Z

0

In most databases, using "not in" is simple, intuitive, but slow. Sometimes you can solve it like this:

where myfield in 
(select myfield 
where I want it
minus
select myfield 
where I want to exclude it)

Some databases use the word except instead of minus. I don't think that works with mySql, so you have to do something like this:

select somefields
from sometables
left join (
select idfield, someOtherField
from blah 
where I want to exclude it
) temp on sometable on sometable.idfield = temp.idfield
and temp.someOtherField is null

answered Oct 7, 2014 at 16:23

Dan Bracuk

20.8k5 gold badges30 silver badges44 bronze badges

Comments

Mike Brant · Accepted Answer · 2014-10-07 16:30:45Z

Here is how you think of it:

IN the subselect you are finding the list of client_id's that have at least one record where flag = 0.

You then exclude that list of id's from the main query.

So if you had sample data like this:

client_id   flag    s_date
---------   ----    ------
1           1       2014-01-01
2           0       2014-02-01
2           1       2014-03-01
3           0       2014-04-01
4           1       2014-05-01
4           1       2014-06-01

Your query would only return:

s_date       client_id
------       ---------
2014-01-01   1
2014-05-01   4

Actually in your query, the redundant use of flag is actually in the main query. It's not needed there at all since you have already eliminated all client_id's with any flag=0 values in the subselect.

As far as optimizing the query. This is one of those cases where maybe the subselect is faster than a join and maybe it is not. It really depends on the number of rows of data, the number of those rows that meet the subselect condition, etc. (assuming of course all proper indexing is in place).

You can try a self join like this to see which performs better for you:

SELECT
    MIN(a.s_date) AS s_date,
    a.client_id AS client_id
FROM tbl AS a LEFT JOIN (
    SELECT DISTINCT client_id 
    FROM tb1 
    WHERE flag = 0
) AS b
ON a.client_id = b.client_id
WHERE b.client IS NULL
GROUP BY a.client_id

Also try answer by @GordonLinoff that is another creative option for getting to this same query result

thanks for the explanation, it makes sense now. And your code really works. it only takes 1 sec to get the result. cheers

Collectives™ on Stack Overflow

How to improve speed of this mysql query processing

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related