MySQL GROUP BY with sorting

Question

I'm having some trouble writing succinct code to generate the desired result efficiently (on a multiple million records DB).

items will be grouped by time
items will be selected by provider being that B takes precedence over A (and C over B)
value must match value of selected provider

Table vs wanted result:

// given this table
id | provider | time       | value
---+----------+------------+-----------
 1 |    A     | 2013-07-01 |  0.1
 2 |    A     | 2013-07-02 |  0.2
 3 |    B     | 2013-07-02 |  0.3
 4 |    A     | 2013-07-03 |  0.4

// extrapolate this result
---+----------+------------+-----------
1  |   A      | 2013-07-01 |  0.1
3  |   B      | 2013-07-02 |  0.3
4  |   A      | 2013-07-03 |  0.4

The queries to generate table and populate data:

data_teste CREATE TABLE `data_teste` (`id` int(11) unsigned NOT NULL AUTO_INCREMENT,`provider` varchar(12) NOT NULL,`time` date NOT NULL,`value` double NOT NULL,PRIMARY KEY (`id`),UNIQUE KEY `index` (`provider`,`time`),KEY `provider` (`provider`),KEY `time` (`time`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO data_teste(`provider`, `time`, `value`) VALUES('A', '2013-07-01', 0.1),('A', '2013-07-02', 0.2),('B', '2013-07-02', 0.3),('A', '2013-07-03', 0.4);

This is the classic group_by/sort problem windowed.

Thank you very much.

@EvanMulawski there are different providers than bring in data on a time-series but when there is overlap in the time field I want the data from provider B to take precedence over A (and etecetera) — Frankie
– Frankie, Commented Jul 18, 2013 at 17:22

juergen d · Accepted Answer · 2013-07-18 17:13:23Z

1

select d.* 
from data_teste d
inner join
(
   select `time`, max(provider) mp
   from data_teste
   group by `time`
) x on x.mp = d.provider 
    and x.`time` = d.`time`
order by  `time` asc, 
          provider desc

answered Jul 18, 2013 at 17:13

juergen d

205k40 gold badges305 silver badges377 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Frankie Over a year ago

This is definitely more elegant than what I had. Still performs slow (up to 4 seconds) but it's so much more concise that I probably can speed it up by requesting only limited buckets of time. Thanks!

juergen d Over a year ago

You can use explain select ... to see where the performance bottleneck is.

Frankie Over a year ago

All the indexes looks good. The main bottleneck is actually having to group with such a dataset size. It can, and is, controllable by limiting it's time range. Thanks!

Akash · Accepted Answer · 2013-07-18 17:39:00Z

0

How well does this perform?

SELECT 
  *
FROM 
  `data_teste` dt1 
   LEFT JOIN `data_teste` dt2 ON ( dt2.time = dt1.time 
                                    AND dt2.provider > dt1.provider )
WHERE 
  dt2.ID IS NULL

edited Jul 18, 2013 at 17:39

answered Jul 18, 2013 at 17:31

Akash

5,02212 gold badges44 silver badges71 bronze badges

Collectives™ on Stack Overflow

MySQL GROUP BY with sorting

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related