I've been stumped trying to optimize this query and was hoping some of you database wizards might have some insight. Here is the setup.
Using TimescaleDB as my database, I have a wide table containing sensor data, it looks like the below:
| time | sensor_id | wind_speed | wind_direction |
|---|---|---|---|
| '2023-12-18 12:15:00' | '1' | NULL | 176 |
| '2023-12-18 12:13:00' | '1' | 4 | 177 |
| '2023-12-18 12:11:00' | '1' | 3 | NULL |
| '2023-12-18 12:09:00' | '1' | 8 | 179 |
I want to write a query which gives me the most recent non-null value for a set of columns, filtered on sensor_id. For the above data (filtering on sensor_id 1), this query should return
| wind_speed | wind_direction |
|---|---|
| 4 | 176 |
With that being said, my query looks like the below (when querying for sensor_ids in batches of 10):
SELECT
(SELECT wind_speed FROM sensor_data WHERE sensor_id = '1' AND "time" > now()-'7 days'::interval AND wind_speed IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_speed,
(SELECT wind_direction FROM sensor_data WHERE sensor_id = '1' AND "time" > now()-'7 days'::interval AND wind_direction IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_direction,
(SELECT wind_speed FROM sensor_data WHERE sensor_id = '2' AND "time" > now()-'7 days'::interval AND wind_speed IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_speed_two,
(SELECT wind_direction FROM sensor_data WHERE sensor_id = '2' AND "time" > now()-'7 days'::interval AND wind_direction IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_direction_two,
.
.
.
(SELECT wind_speed FROM sensor_data WHERE sensor_id = '10' AND "time" > now()-'7 days'::interval AND wind_speed IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_speed_ten,
(SELECT wind_direction FROM sensor_data WHERE sensor_id = '10' AND "time" > now()-'7 days'::interval AND wind_direction IS NOT NULL ORDER BY "time" DESC LIMIT 1) as wind_direction_ten;
The table I am querying against has 1,000 unique sensor_ids, all of which report data at a 2 minute interval. Hence, we are talking 100s of millions of rows.
I've created an index on (sensor_id, time DESC) to further optimize the query. With the index, this query is taking roughly 400ms and 50ms planning and execution time respectively.
How can I write the query differently (or add indexes) to achieve optimal planning and execution time?
sensorwith one rows for every relevantsensor_id? How often do you query? Are rows immutable once written (and never deleted)?