SQL: Group by multiple fields

Question

I need to select some values, group them and order by multiple fields. Here is my fiddle: http://sqlfiddle.com/#!2/a80eb/3

What I need to achieve, is to select one row from the table packet_data for a given value of client_mac and for each distinct item in the column drone_id. This row should contain the value of client_mac, drone_id and the most frequent value of the antenna_signal column for the given combination of drone_id and client_mac.

The column drone_id is a foreign key into the table drones and there is a column map_id in this table. I need to concern only those rows from the packet_data table which have a certain map_id in the drones table.

My desired result should be this:

CLIENT_MAC          DRONE_ID    ANTENNA_SIGNAL
3c:77:e6:17:9d:1b   1           -37
3c:77:e6:17:9d:1b   2           -57

My current SQL query is:

SELECT `packet_data`.`client_mac`,
       `packet_data`.`drone_id`,
       `packet_data`.`antenna_signal`,
       count(*) AS `count`
FROM `packet_data`
JOIN `drones` ON `packet_data`.`drone_id`=`drones`.`custom_id`
WHERE `drones`.`map_id` = 11
  AND `client_mac`="3c:77:e6:17:9d:1b"
GROUP BY drone_id,
         `packet_data`.`antenna_signal`
ORDER BY `packet_data`.`drone_id`,
         count(*) DESC

And my current result:

CLIENT_MAC          DRONE_ID    ANTENNA_SIGNAL
3c:77:e6:17:9d:1b   1           -37
3c:77:e6:17:9d:1b   1           -36
3c:77:e6:17:9d:1b   2           -57
3c:77:e6:17:9d:1b   2           -56

I like the result, but not they way you achieved it :) I need the value of "antenna_signal" which is repeating for the most times, not the one with the smallest value. — filo891
– filo891, Commented Jun 17, 2014 at 20:19
Can you pls go deeper? An example maybe? I have no experience with ties yet. — filo891
– filo891, Commented Jun 17, 2014 at 20:39
Aaaah .. my insufficient English :) In that case AVG() would be great, but one random value from the ones with the same count should work too. — filo891
– filo891, Commented Jun 17, 2014 at 20:46

VMai · Accepted Answer · 2014-06-17 21:07:22Z

2

You can get your desired result with a not very nice correlated subquery (on a subquery too). I don't know how it will be scaling with a huge amount of data:

SELECT
    -- the desired columns
    client_mac,
    drone_id,
    antenna_signal,
    amount               -- I added this so I could easily check the result
FROM (
    -- give me the count of every value of the antenna_signal column 
    -- for each combination of client_mac, drone_id and antenna_signal
    SELECT 
        client_mac,
        antenna_signal,
        drone_id,
        COUNT(antenna_signal) AS amount
    FROM
        packet_data
    WHERE
        client_mac = '3c:77:e6:17:9d:1b'
    GROUP BY
        client_mac,
        drone_id,
        antenna_signal
) as1
WHERE
    -- but I want only those rows with the highest count of equal antenna_signal
    -- values per client_mac and drone_id
    amount = (
        SELECT 
            MAX(as2.amount)
        FROM (
            SELECT 
                pd2.client_mac,
                pd2.antenna_signal,
                pd2.drone_id,
                COUNT(pd2.antenna_signal) AS amount
            FROM
                packet_data pd2
            WHERE
                client_mac = '3c:77:e6:17:9d:1b'
            GROUP BY
                client_mac,
                drone_id,
                antenna_signal
            ) as2
        WHERE 
            as1.client_mac = as2.client_mac AND as1.drone_id = as2.drone_id
);

It shouldn't be too difficult to join other tables if it is desired. But this will display both rows, if there are two antenna_signals with equal count for the same client_mac and drone_id. See it in the updated fiddle http://sqlfiddle.com/#!2/a80eb/80

answered Jun 17, 2014 at 21:07

VMai

10.4k9 gold badges28 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

filo891 Over a year ago

Thanks a lot! I expect to have several millions of rows :) do you think MySQL will handle it?

VMai Over a year ago

@filo891 It's a dependent subquery. I don't believe it will scale. And I feared your amount of data too. You could try to select the innermost subquery into a temporary table and query this temporary table. Even then it's a correlated subquery and it may very well take a lot of time to execute. How often will you do this?

filo891 Over a year ago

So let me explain.. My DB contains data collected in real-time from several WiFi sniffers. I have an application, which is doing real-time localization of WiFi clients. so I need to run this query for all currently present MAC addresses (interval of 30 seconds of collected data - it might be even 200 different addresses) at least once per 10-20 seconds to make the localization as real-time as possible. I will have to reduce amount of collected/stored data for sure, but anyway I need to keep some history for at least few days, so I'm sure that millions of rows will be a real amount.

VMai Over a year ago

@filo891 That sounds reasonable. But for the real-time localization you need only recent data. So I would either dump all data to a history table and clear the recent data or do it the other way and get only the recent data into the temporary table for the real-time analysis. I believe I would prefer the second option. Test it with real data, then optimize.

Collectives™ on Stack Overflow

SQL: Group by multiple fields

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related