1

I have a MySQL 8.0.23 table called events with the following schema:

eventUID - integer NOT NULL

camtimestamp - MySQL datetime stamp

direction - string - either "In" or "Out"

propUID - integer

In a single SELECT statement, I am trying to determine by hour for the last 24 hours how many cars are "In" and how many are "Out". Here is what I am trying (does not yet have the 24 hour limit built in).

select camtimestamp,count(*) from events where direction ="In" and propUID = 7 group by year(camtimestamp),month(camtimestamp),day(camtimestamp),hour(camtimestamp);

And this is a sample of what I am getting.

2022-02-14 22:02:40 38
2022-02-14 21:56:56 15
2022-02-14 20:55:30 47
2022-02-14 19:59:18 51
2022-02-14 18:59:50 36
2022-02-14 17:52:04 10
2022-02-14 16:58:01 16
2022-02-14 15:59:00 36
2022-02-14 14:58:52 44

I also have a table called datehourlist with which I can join in my SELECT.

Sample data:

2019-05-01 00:00:00
2019-05-01 01:00:00
2019-05-01 02:00:00
2019-05-01 03:00:00
2019-05-01 04:00:00
2019-05-01 05:00:00
2019-05-01 06:00:00
2019-05-01 07:00:00
2019-05-01 08:00:00

Also:

mysql> select min(datehour) from datehourlist;
+---------------------+
| min(datehour)       |
+---------------------+
| 2019-05-01 00:00:00 |
+---------------------+
1 row in set (0.02 sec)

mysql> select max(datehour) from datehourlist;
+---------------------+
| max(datehour)       |
+---------------------+
| 2040-12-31 00:00:00 |
+---------------------+
1 row in set (0.02 sec)

datehourlist has every hour in it from May 1, 2019 until December 31, 2040.

This is a sample of what I really want from this:

Column 1 below is a rounded grouped timestamp (vs col 1 above being a non-rounded, actual timestamp)

Column 1 below does not skip an hour if there is no data from that hour.

Column 2 below is the "In" count for that hour.

Column 3 below is the "Out" count for that hour.

2019-05-02 06:00:00 5 10
2019-05-02 07:00:00 127 10
2019-05-02 08:00:00 0 0 
2019-05-02 09:00:00 115 10
2019-05-02 10:00:00 71 10
2019-05-02 11:00:00 147 10
2019-05-02 12:00:00 140 10

What SELECT statement should I use to get the output I need?

Also, how would I optimize that SELECT statement?

Within events, I have 500k events and growing by 100s everyday.

Thank you in advance for your help.

Thank you for a great solution so quickly.

SELECT dhl.datehour datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM datehourlist dhl
  LEFT JOIN events  ev
       ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY dhl.datehour;
2
  • You should provide table structure Commented Mar 14, 2022 at 17:51
  • ... and A) sample data for that table i.e. CREATE TABLE and INSERT statements and B) expected results based on the sample Commented Mar 14, 2022 at 17:52

1 Answer 1

1

First, we need a trunc_to_hour() function that takes an arbitrary DATETIME or TIMESTAMP value and gives back the beginning of its hour. That is this.

DATE_FORMAT(camtimestamp, '%Y-%m-%d %H:00:00') 

Second, we need a WHERE expression that can handle the most recent 24 hours. That is this.

WHERE camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
  AND camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')

For the example timestamp 2021-03-14 16:04:30 this gives the following.

WHERE camtimestamp >= `2021-03-13 16:00:00`
  AND camtimestamp <  '2021-03-14 16:00:00`

That is, it chooses records for the most recent 24 complete clock hours. You may have to adjust this WHERE expression if you want the hours to date.

Third, we need conditional sums (for In and Out).

It happens that the expression direction = 'In' gives 1 when direction is In, 0 when direction is some other string (like Out), and NULL if direction itself is NULL. So

   SUM(direction='In') 

counts the rows meeting that criterion.

Fourth, when the SUM is null, we want to show zero. Like this.

   COALESCE(SUM(direction='In'),0)

Fifth, we can put it together something like this:

SELECT DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM events ev
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00')

That gives you your result set. But it still might be missing some hours if there are no records for those hours.

So, sixth, we can join that to your pre-existing hourly calendar table like this:

SELECT dhl.datehour datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM datehourlist dhl
  LEFT JOIN events  ev
       ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY dhl.datehour

And that should do it. (Not debugged.)

Sign up to request clarification or add additional context in comments.

2 Comments

Very little debugging needed. In the WHERE clause, I changed the hard-coded date out with the dynamic example you gave above that. And voila! It worked!
Oops, editing error for the wrong datetime filter. Fixed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.