1

I am trying to optimize the sql query on a large event table (10 million+ rows) for date range search. I already have unique index on this table which (lid, did, measurement, date).The query below is trying to get the event of three type of measurement (Kilowatts, Current and voltage) for every 2 second interval in date column :

SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events 
WHERE lid = 1 
  and did = 1
  and measurement IN ("Voltage") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Current") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Kilowatts") 
group by timekey

This is the table that I am trying to look up to.

=============================================================
id  |  lid   |   did   |   measurement  |  date 
=============================================================
1   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:00
=============================================================
2   |  1     |   1     |   Current      | 2020-04-27 00:00:00
=============================================================
3   |  1     |   1     |   Voltage      | 2020-04-27 00:00:00
=============================================================
4   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:01
=============================================================
5   |  1     |   1     |   Current      | 2020-04-27 00:00:01
=============================================================
6   |  1     |   1     |   Voltage      | 2020-04-27 00:00:01
=============================================================
7   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:02
=============================================================
8   |  1     |   1     |   Current      | 2020-04-27 00:00:02
=============================================================
9   |  1     |   1     |   Voltage      | 2020-04-27 00:00:02

The expected result is retrieve all data that have the date equal to 2020-04-27 00:00:00 and 2020-04-27 00:00:02. The query provided above work as expected. But I am using UNION for look up different measurements on the table, I believe it might not be the optimal way to do it.

Can any SQL expert help me to tone the query that I have to increase the performance?

5
  • You have a GROUP BY with a SELECT *. That should fail -- in SQL in general and in the more recent versions of MySQL. Your query doesn't make sense. Commented Apr 27, 2020 at 23:17
  • Im using SQL version 5.7. Why Group by with select * would fail in the more recent version? Why is it does not make sense? Commented Apr 28, 2020 at 1:05
  • . . Because you have unaggregated columns in the SELECT that are not in the GROUP BY. Commented Apr 28, 2020 at 2:17
  • Also, grouping issue. You have the components to show the date, what and where, but what values do you want to show per Current, Voltage, Kilowatts. If entries are coming in every second, do you want min, max, avg, all of these per measurement to look for spikes or something? Missing context of rest of the columns and how the group by presents what your final results should be. Please edit your existing post and add some additional sample data of what would be summed, and your expected results to be shown. Just the fact of a record exists is one thing, but no context thereafter. Commented Apr 28, 2020 at 2:59
  • What should happen if a particular second is missing from the input table? Commented Apr 30, 2020 at 5:19

3 Answers 3

1

You have one record every second for each and every measurement, and you want to select one record every two seconds.

You could try:

select *
from events
where 
    lid = 1 
    and did = 1 
    and measurement IN ('Voltage', 'Current')
    and extract(second from date) % 2 = 0

This would select records that have an even second part.

Alternatively, if you always have one record every second, another option is row_number() (this requires MySQL 8.0):

select *
from (
    select 
        e.*, 
        row_number() over(partition by measurement order by date) rn
    from events
    where 
        lid = 1 
        and did = 1 
        and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1

This is a bit less accurate than the previous query though.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, the first approach works for 1, 2 second interval because of the operation extract(second from date) % 2 = 0. What if I try to apply 30 second interval, 15 minutes, half hours, 6 hours, etc. I try (minutes from date) % 15 = 0, but it is not work as I expected.
30 seconds interval would be: extract(second from date) % 30 = 0. For intervals greater than one minute, you could use unix_timestamp() - for 15 minutes, something like unix_timestamp(date) % (60 * 15) = 0, and so on.
Is the query be something like this? select * from events where lid = 1 and did = 9999 and measurement IN ("Voltage", "Current", "Kilowatts") and unix_timestamp(date) % (60*15) = 0 But this query return no result to me? I cannot get data for every 15 mintues interval
The row_number approach assumes that there is a reading every second. This kind of data can be somewhat flaky.
0

Your query is actually three queries combined into one. Luckily they all select rows of data based on similar columns. If you want to make this query run fast you can add the following index:

create index ix1 on events (lid, did, measurement);

1 Comment

Yes, I created the index already, I am trying to look up a way to increase the speed on top of index
0

In addition to above suggestions, changing the PRIMARY KEY will give you a little more performance:

PRIMARY KEY(lid, did, date, measurement)

and toss id.

Caveat, there could be hiccups if two readings come in at exactly the same "second". This could easily happen if one reading comes in just after the clock ticks, and the next comes in just before the next tick.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.