So, I have a task to produce design to store data retrieved from sensors, like temperature, pressure, roll etc.
First prototype was quite simple, I created TimeSync table consisting of 2 columns: ID increment, and Time. Then for each value I would create table of 2 columns: foreign key to timesync id, and value as float.
Reading back data was quite easy, I would filter time sync table from date ranges I'm interested in, and join other tables which I want to read back. The issue was disk space usage. We are quite limited on disk space, and storing 12 parameters for a year at 1Hz logging used ~100gb on Sqlite.
So we made a decision to move to postgresql and apply a little more complicated logic. Thing is, that there is quite a lot of duplicated data, let's say temperature. It doesn't change every 1s, it might chance once a minute, so there is no need to store it so often. The perfect solution would be to store first value received, then on next value check if it has changed, and if so, only then write to database.
So, that means there can be different frequencies at which values are stored, let's say a roll changes every 1s, and temperature changes every 60s. Now my issue is how to combined that data into single query.
My design so far is to store when device was online, and when it was offline. This will provide clues later on how to do proper filtering.
Next, to store each value in different table, consisting of 2 columns: Time and value itself.
So, some examples, activity table looks like:
ID DevID Online Offline
1 1 2017-01-16 16:13:46 2017-01-16 16:24:38
13 1 2017-01-16 16:32:51 2017-01-16 16:42:16
and data tables: Data Table1:
Time Value
2017-01-16 16:13:59 18.9
2017-01-16 16:14:20 17.9
2017-01-16 16:15:08 19.9
Data Table2:
Time Value
2017-01-16 16:13:57 348
2017-01-16 16:14:05 350
2017-01-16 16:14:17 353
I'm using generate_series from postgresql, and it looks like it is what I need:
select *
--This will generate a series for specified range at specified interval
from generate_series(
'2017-01-16 16:10' :: timestamp,
'2017-01-16 19:00' :: timestamp,
'1 second' :: interval) as date
--Joining one data table
left outer join
(select
date_trunc('second', data.device_2_p_3.timestamp) as val1time,
avg(value) as val1avg
from data.device_2_p_3
group by val1time) results on (date = results.val1time)
--Joining other data table
left outer join
(select
date_trunc('second', data.device_1_p_1.timestamp) as val2time,
avg(value) as val2avg
from data.device_1_p_1
group by val2time) results2 on (date = results2.val2time)
order by date asc;
And the readback is:
date val1time val1avg val2time val2avg
14:00.0 null null null null
14:01.0 null null 14:01.0 349
14:02.0 14:02.0 18.8 null null
14:03.0 null null 14:03.0 349.5
14:04.0 null null null null
The issue is that I'm not able to interpolate or repeat data from previous value if device was active at that point. Any clues how to solve this, or suggestions how to improve design would be highly appreciated.