1

I have a table which contains nearly 1 million+ records. I want to find the max record of each group. Here is my sql:

SELECT * 
FROM t 
WHERE id IN (SELECT max(id) AS id 
             FROM t 
             WHERE a = 'some' AND b = 0 
             GROUP BY c, d);

Table declares as follow.

CREATE TABLE `t` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `a` varchar(32) NOT NULL COMMENT 'a',
  `b` tinyint(3) unsigned NOT NULL COMMENT 'b',
  `c` bigint(20) unsigned NOT NULL COMMENT 'c',
  `d` varchar(32) NOT NULL COMMENT 'd',
  PRIMARY KEY (`id`),
  KEY `idx_c_d` (`c`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='test table';

I have a union index on c and d. So the second statement(SELECT max(id) AS id FROM t WHERE a = 'some' AND b = 0 GROUP BY c, d) execute in 200ms. But the total statement cost nearly 6 seconds(The result contains 5000 rows). Here is the explain shows(some columns are omitted).

+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| select_type | table | type  | possible_keys |  key   |  rows   | filtered |          Extra           |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| PRIMARY     | t     | ALL   | NULL          | NULL   | 9926024 |   100.00 | Using where              |
| SUBQUERY    | t     | index | idx_1         | idex_1 | 9926024 |     1.00 | Using where; Using index |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
6
  • What's 1000W?.. Commented May 15, 2019 at 6:29
  • Watt. 1000W - that's a huge guitar amp! Commented May 15, 2019 at 6:35
  • Although the result only contains 5000 rows The time a query takes, usually doesn't depend on the amount of results it gets, but on the amount of data you have to look. If you have 1000000 books without any order (index) and you want to find one that you don't have, you will have to look to the 1000000 books. So you will take a long time to get 0 results Commented May 15, 2019 at 6:54
  • @nacho Yes I know that. I just want to provide more information about this question. Thank you the same anyway. Commented May 15, 2019 at 6:57
  • The sub query using IN like that is effectively not using the index when checking if an id is one of 5000 records. Hence the likely cause of the slowness. Commented May 15, 2019 at 8:31

4 Answers 4

1

All different ways to "skin-a-cat", but here's slightly different... Since you are looking for IN, I would move that query into the front position. Also, it MAY help using MySQL's language specific keyword "STRAIGHT_JOIN" telling MySQL to do in the order you have listed. Again it MAY help

SELECT 
      T.* 
   FROM 
      (SELECT max(id) AS id 
          FROM t 
          WHERE b = 0 
             AND a = 'some' 
          GROUP BY c, d) PQ
      JOIN T
         on PQ.ID = T.ID

I would also have index specifically in order of

(b, a, c, d, id )

Obviously keep the primary ID key, and if using STRAIGHT_JOIN, would be

SELECT STRAIGHT_JOIN 
      T.* ( ... rest of query) 
Sign up to request clarification or add additional context in comments.

1 Comment

MySQL is very likely to do the subquery first, regardless of ordering, and without needing to say STRAIGHT_JOIN. And, yes, that 5-column index is beneficial to the derived table. Unless this fails to get the 'right' answer, I predict that it is the 'fastest'.
0

you can try by using corelated subquery and creating index in column c and d

SELECT t1.* FROM table_name t1 
WHERE id = (SELECT max(id) AS id FROM table_name t2 where
             t1.c=t2.c and t1.d=t2.d
            ) and t1.a = 'some' AND t1.b = 0 

9 Comments

Group by c,d is missing in the sub query.
@mkRabbani vai here no need group by :)
@Strawberry yap you are right where was missing thanks
As OP wants MAX records of each group, So I feel GROUP BY is required and also the condition should be WHERE ID IN (.....). Your query will return one single row from all rows.
@mkRabbani nope vai it will return for each group the max 1 ,not for all single .you can try using a fiddle :)
|
0

Avoiding the need for a sub query

SELECT t1.*
FROM t t1
LEFT OUTER JOIN t t2
ON t1.c = t2.c
AND t1.d = t2.d
AND t1.id < t2.id
AND t2.id IS NULL
AND t2.a = 'some' 
AND t2.b = 0 

4 Comments

There are some mistakes in the sql. And still doesn't satisfy my demand. It's very slow too.
@weaver , put up the table declares and I can test it. But should be quicker if you have useful indexes (index on a, b, c, d and id in that order)
I just added the table declares. I think maybe in is the effectively way for this problem.
@weaver - realised I made a mistake on the a and b columns. However IN with a sub query is probably not efficient for this, as i will force and non keyed join between 5000 records from your sub query and 10 million records from the table. Adding a suitable index will help a LOT when not using a sub query.
0

I recommend using a correlated subquery:

SELECT t.* 
FROM t 
WHERE t.id IN (SELECT MAX(t2.id)
            FROM t t2
            WHERE t2.c = t.c AND t2.d = t.d AND
                  t2.a = 'some' AND t2.b = 0
           );

This assumes that id is unique in the table.

For performance, you want an index on (c, d, a, b, id).

2 Comments

I want to find the max id of each group, I'afraid your sql doesn't satisfy demand.
@weaver . . . How do you define "group"? This follows the definition that you imply in your question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.