0

I have this query, but it takes too long, approximately 30 seconds via NaviCat. How can it be optimized if it's possible?

SELECT DISTINCT c.clientid, c.name, c.email, c.region 
FROM clients c RIGHT JOIN orders o ON c.clientid = o.clientid 
WHERE o.order_status = 'pending' 
AND c.clientid NOT IN (
    SELECT DISTINCT c.clientid 
    FROM clients c, orders o
    WHERE c.clientid = o.clientid AND o.order_status = 'paid'
    ) 
ORDER BY c.id DESC

To understand better what I need: I have 2 tables:

clients (id, clientid, name, email, region) 
orders (id, orderid, clientid, order_amount, order_status, ….)

Example of records:

Client | Order | Status
-----------------------
C1     | O1    | (paid)
C1     | O2    | (pending)
C2     | O3    | (paid)
C3     | O4    | (pending)
C4     | O5    | (paid)
C5     | O6    | (pending)

I need to return only C3 and C5

Many thanks for your answers.

6
  • Why are you checking twice on o.order_status = ? Is every orders have one status ? Commented Dec 14, 2011 at 16:51
  • Are you sure you want to be doing a RIGHT JOIN? It seems like you really want an INNER JOIN. (Actually it seems like you really want an IN clause, but I can imagine performance reasons forcing you to use a JOIN instead.) Commented Dec 14, 2011 at 16:52
  • 1
    @ajreal: Every order has one status, but one client can have multiple orders. The OP wants to find every client that does have a "pending" order and does not have any "paid" orders. Commented Dec 14, 2011 at 16:53
  • If you want optimization, you should also provide the tables' definitions and what indexes you have. Commented Dec 14, 2011 at 17:13
  • What I don't understand is why you have two columns that seem to serve the same purpose (Primary Key) in both tables (id and clientid in table client) Commented Dec 14, 2011 at 17:21

5 Answers 5

1

There are lots of ways, here is one of the trick :-

SELECT c.clientid, c.name, c.email, c.region,
  SUM(IF(o.order_status = 'paid', 1, 0)) as paid
FROM clients c
INNER JOIN orders o 
ON c.clientid = o.clientid 
WHERE o.order_status IN( 'pending', 'paid')
GROUP BY c.clientid
HAVING paid = 0;
Sign up to request clarification or add additional context in comments.

7 Comments

Oh, nice idea, with the SUM, way more logical than mine :p
wow! the query took 5.281 s! Thanks a lot! Even I added ORDER BY c.id DESC it took 5.435 s
no problem, but I have check on your other comments, you only have few thousands records, maybe you should create a new question to discuss your index strategy.
Specifically, do you have an index on orders.clientid? 5 sec is way too long unless you're running this on your phone!
You will think about it at 50K rows? And maybe decide it at 500K rows? Save yourself some CPU hours (or days) and add indexes ealier. All columns that are in used in Joins should have an index. And almost all that are used in Where conditions of frequently run queries.
|
1

Not sure how this will work, but try something like:

SELECT DISTINCT c.clientid, c.name, c.email, c.region 
FROM clients c
RIGHT JOIN orders o ON c.clientid = o.clientid AND o.order_status = 'pending'
LEFT JOIN orders o2 ON o.clientid = o2.clientid AND o.order_status = 'paid'
WHERE o2.clientid IS NULL

Basically, try to match up a pending and a paid order, and take only the pending orders where this fails.

On the pro side, you don't have the million subqueries. A con is that the number of generated rows before the WHERE culls them is potentially much larger. So I don't know whether it'd help or hurt.

EDIT: Also, yeah, like @ruakh in comments, I wondered why the RIGHT JOIN there... can an order have zero clients, or am I missing something?

1 Comment

I liked a lot your approach but your query took 38.937 s and mine took 30.906 s On top, the returned records are different (?!) and it shouldn't be. 1486 for you and 805 for me. total records in table 3501
1

There are some great ideas here, but trying to optimize a query without knowing what is going on in the database engine isn't the most direct route to the best answer. Sometimes optimizing just requires an additional index, not a change to the SQL.

The first thing you should do is look at an explain plan (documentation for 5.1) and then decide if you can change the query or add indexes or something else. Probably one of the answers provided is correct, but without the execution plan you're just guessing.

Couple of thoughts for your query.

I don't understand why you need the RIGHT JOIN. Since you are after the clients an INNER JOIN should be sufficient.

Any query that uses DISTINCT or GROUP BY will require a final sort. If the number of rows that need to be sorted (clients x orders) is large it will hurt performance. If it is @ypercube's approach might be good, otherwise @ajreal's trick looks promising. Good luck.

Edit: Here is an interesting blog on this type of query and several approaches.

Comments

0

Something like this would be better :

SELECT DISTINCT c.clientid, c.name, c.email, c.region 
    FROM clients c 
INNER JOIN orders o ON c.clientid = o.clientid 
LEFT OUTER JOIN (
    SELECT cc.clientid FROM clients cc 
        INNER JOIN orders oo WHERE cc.clientid = oo.clientid AND      
        oo.order_status = 'paid'
    GROUP BY cc.clientid) cp ON cp.clientid = c.clientid
WHERE o.order_status = 'pending' 
AND cc.clientid IS NULL
ORDER BY c.id DESC

If your tables are big, you do not want to use IN or OR in your queries, they won't allow MySQL to use indexes, plus, in your subquery you didn't use an inner join, that was wrong.

Comments

0

Using EXISTS:

SELECT c.clientid, c.name, c.email, c.region 
FROM clients c 
WHERE EXISTS
      ( SELECT *
        FROM orders o 
        WHERE o.clientid = c.clientid 
          AND o.order_status = 'pending'
      ) 
  AND NOT EXISTS
      ( SELECT *
        FROM orders o 
        WHERE o.clientid = c.clientid 
          AND o.order_status = 'paid'
      ) 
ORDER BY c.id DESC

Using JOIN:

SELECT c.clientid, c.name, c.email, c.region 
FROM clients c 
  JOIN orders o
    ON  o.clientid = c.clientid 
    AND o.order_status = 'pending'
  LEFT JOIN orders o2
    ON  o2.clientid = c.clientid 
    AND o2.order_status = 'paid'
WHERE o2.clientid IS NULL
GROUP BY c.clientid
ORDER BY c.id DESC

What I don't understand is why you have two columns that seem to serve the same purpose (Primary Key) in both tables (id and clientid in table client and same in table order).

10 Comments

The first query is not really that performant. The subqueries in the WHERE clause have to run for every row returned from the main query. If you have thousands of clients that's going to hurt. The second one is way better.
@FranciscoSoto: Is that a hunch or you have tested this?
@FranciscoSoto: See the answer of Quassnoi for a way to bypass this and an explanation of why it may be better (or worse): stackoverflow.com/questions/1766702/…
Exists took 20.062 s and the join 13.844 s Good enough if I look at my query of 38 s Thanks for help
@FranciscoSoto A join must do the lookup for each row in the base table. Databases such as Oracle and DB2 (don't know about MySQL) can sometimes optimize an EXISTS into a join (and even do a short-circuit evaluation). Bottom line, you have to know what the database is doing.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.