1

I have this table

CREATE TABLE gotrax1.wifi_log (
    WifiID int(11) NOT NULL AUTO_INCREMENT,
    UnitID int(11) DEFAULT NULL,
    ServerTime timestamp NULL DEFAULT CURRENT_TIMESTAMP (),
    FileTime bigint(20) DEFAULT NULL,
    WLANTYPE text DEFAULT NULL,
    MACSRC varchar(25) DEFAULT NULL,
    MACDST varchar(25) DEFAULT NULL,
    BSSID varchar(25) DEFAULT NULL,
    SIG int(11) DEFAULT NULL,
    ESSID text DEFAULT NULL,
    PRIMARY KEY (WifiID)
)

I need to run this query on it

SELECT 
    COUNT(DISTINCT(MACDST)) AS MACDST,
    COUNT(DISTINCT(MACSRC)) AS MACSRC,
    COUNT(DISTINCT(BSSID)) AS BSSID,
    COUNT(DISTINCT(MACDST))-COUNT(DISTINCT(MACSRC)) AS UnitDIFF,
    UnitID, FileTime, WLANTYPE
FROM wifi_log 
GROUP BY FileTime,UnitID,WLANTYPE
ORDER BY FileTime DESC;

It is dog slow and does a full file sort. Normally I know to add an index following the order of a where clause. I have no idea how to do it with this query and this table to avoid the filesort. Any suggestions would be terrific thankyou.

7
  • COUNT(DISTINCT) will also slow things down. Commented Dec 14, 2019 at 20:55
  • 3
    run explain select ... to see the execution plan. Commented Dec 14, 2019 at 20:56
  • You can have 2 indexes 1. complex index [FileTime,UnitID,WLANTYPE] and 1. for fileTime in a desc order. check this Commented Dec 14, 2019 at 21:02
  • Thanks I had run explain prior to posting and all is says is using filesort (which I included above) Commented Dec 14, 2019 at 22:17
  • thanks I added these two indexes (I had put DESC on the filetime but it didnt seem to stick. no difference still using filesort ALTER TABLE gotrax1.wifi_log ADD INDEX IDX_wifi_log_FileTime (FileTime); ALTER TABLE gotrax1.wifi_log ADD INDEX IDX_wifi_log2 (FileTime, UnitID, WLANTYPE (1)); Commented Dec 14, 2019 at 22:19

3 Answers 3

1
+50

You can't create an index on WLANTYPE as it is, because if you try to index a TEXT or BLOB, you get this error:

ERROR 1170 (42000): BLOB/TEXT column 'wlantype' used in key specification without a key length

I would question whether you need WLANTYPE to be TEXT. Perhaps a shorter VARCHAR would be just as good.

alter table wifi_log modify wlantype varchar(10);

Then you can add a covering index:

alter table wifi_log add key (filetime,unitid,wlantype,macdst,macsrc,bssid);

Also get rid of the ORDER BY FileTime so you don't have to sort the result. Sort the result after fetching the result in your application, if it isn't already in the order you want.

EXPLAIN
SELECT 
    COUNT(DISTINCT(MACDST)) AS MACDST,
    COUNT(DISTINCT(MACSRC)) AS MACSRC,
    COUNT(DISTINCT(BSSID)) AS BSSID,
    COUNT(DISTINCT(MACDST))-COUNT(DISTINCT(MACSRC)) AS UnitDIFF,
    UnitID, FileTime, WLANTYPE
FROM wifi_log 
GROUP BY FileTime,UnitID,WLANTYPE
ORDER BY NULL\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wifi_log
   partitions: NULL
         type: index
possible_keys: FileTime
          key: FileTime
      key_len: 366
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index

The type: index in this explain report shows that it still has to scan the whole index, which is nearly as expensive as a table-scan. But that's natural for your query, which needs to get counts from every row.

The advantage of making this an index scan may be that it has to examine fewer pages. One index, even on 6 columns, is smaller than the whole table.

Also getting rid of the filesort will help.

Sign up to request clarification or add additional context in comments.

8 Comments

I doubt if there is an extra sort for the ORDER BY since it adequately matches the GROUP BY. This would indicate whether there is an extra sort: EXPLAIN FORMAT=JSON SELECT...
I see you're right! I get "using_filesort": false, in the JSON EXPLAIN report.
So the best optimization is to use the index-scan instead of a table-scan. I doubt that will make much difference at the end of the day.
First, "covering" gives a speedup. Then there are two potential sorts to try to avoid. I would expect the first 3 columns of your index to avoid the 'filesort' for GROUP BY; does it? Then, the similarity between GROUP BY and ORDER BY should obviate the filesort for ORDER BY. And, in getting rid of the first filesort, the "temp table" may go away. I see that a 4 possible speedups. (Need to see the schema and EXPLAIN FORMAT=JSON to know for sure.) Also, is it InnoDB??
It's always InnoDB. What else? :-) I didn't populate the table with any data, which could also make a difference. If you want to do all that work and post your own answer, be my guest. :)
|
0

I have a technique for approximating COUNT(DISTINCT..) from summarized data. Could you build daily summaries of the data? Then roll up the data for the totals? Such is easy for COUNT (sum of counts) and SUM (sum of sums), but rather tricky for 'uniques'. It gives only approximations, usually within 1% of the exact result. Here is an overview of the technique: http://mysql.rjweb.org/doc.php/uniques

Comments

0

I would suggest you to create a column to hold the hash of your WLANTYPE.

add index to the hash's column and a trigger to set it in insert / update..

and change your query a bit for:

SELECT 
    COUNT(DISTINCT(MACDST)) AS MACDST,
    COUNT(DISTINCT(MACSRC)) AS MACSRC,
    COUNT(DISTINCT(BSSID)) AS BSSID,
    COUNT(DISTINCT(MACDST))-COUNT(DISTINCT(MACSRC)) AS UnitDIFF,
    UnitID, FileTime, max(WLANTYPE) as WLANTYPE
FROM wifi_log 
GROUP BY FileTime,UnitID,WLANTYPEHash
ORDER BY FileTime DESC;

see fiddle:

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=16e86ea77eb242d2339d6050a1772b60

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.