1

In my app I need to produce a data output (to be then converted to JSON) for the data in a postgres database (which is actually temperature data gathered by sensors).

The data is stored in device_history_log as follows (for data between 11am and noon):

row_id;deviceid;sensor_value_raw;last_update
104401;20865735;21.56;"2015-06-10 11:00:14"
104432;493417852;23.9;"2015-06-10 11:00:58"
104516;20865735;21.06;"2015-06-10 11:05:14"
104578;493417852;23.7;"2015-06-10 11:06:43"
104583;20865735;21.12;"2015-06-10 11:13:21"
104601;20865735;21.18;"2015-06-10 11:17:25"
104623;493417852;22.2;"2015-06-10 11:18:21"
104642;20865735;21.25;"2015-06-10 11:21:29"
104937;20865735;21.31;"2015-06-10 11:25:33"
105081;20865735;21.37;"2015-06-10 11:33:41"
105154;20865735;21.43;"2015-06-10 11:37:45"
105303;493417852;24;"2015-06-10 11:46:09"
105358;20865735;21.62;"2015-06-10 11:49:58"

I want to get 5 min averages for the data and so far I'm using this query:

SELECT grid.t5||'.000000' as ts, 
            CASE
                WHEN avg(t.sensor_value_raw) ISNULL THEN -1
                ELSE avg(t.sensor_value_raw) 
            END AS sensorvalue
FROM (

      SELECT date_trunc('min', time_series) as t5 
      FROM generate_series('2015-06-10 11:00:00'::timestamp, '2015-06-11 12:00:00', 
                   '5 min') as time_series

   ) grid
LEFT JOIN device_history_log t 
     ON t.last_update >= grid.t5 AND t.last_update <  grid.t5 +  interval '5 min' AND (t.deviceid = 493417852 or t.deviceid = 20865735)
GROUP  BY grid.t5
ORDER  BY grid.t5

The resulting data looks like this:

"2015-06-10 11:00:00.000000";23.8999996185303
"2015-06-10 11:05:00.000000";21.0599994659424
"2015-06-10 11:10:00.000000";21.1200008392334
"2015-06-10 11:15:00.000000";21.1800003051758
"2015-06-10 11:20:00.000000";21.25
"2015-06-10 11:25:00.000000";21.3099994659424
"2015-06-10 11:30:00.000000";21.3700008392334
"2015-06-10 11:35:00.000000";21.4300003051758
"2015-06-10 11:40:00.000000";-1
"2015-06-10 11:45:00.000000";22.8100004196167
"2015-06-10 11:50:00.000000";-1
"2015-06-10 11:55:00.000000";-1
"2015-06-10 12:00:00.000000";22.9250001907349

I need the averages as there are more than one sensors reporting back thus need to have an average value.

My question is if it is possible for the value of 10:40:00 instead of -1 to have the previous value as the temperature sensors only report a 'change' in the temperature so 'no data' means 'no change'.

Also for the first values that are -1 (thus no previous data) if possible I'd like them removed/ignored which I can do when parsing the data but if it can be done a query level even better.

Thus my ideal result set would be like this:

"2015-06-10 11:00:00.000000";23.8999996185303
"2015-06-10 11:05:00.000000";21.0599994659424
"2015-06-10 11:10:00.000000";21.1200008392334
"2015-06-10 11:15:00.000000";21.1800003051758
"2015-06-10 11:20:00.000000";21.25
"2015-06-10 11:25:00.000000";21.3099994659424
"2015-06-10 11:30:00.000000";21.3700008392334
"2015-06-10 11:35:00.000000";21.4300003051758
"2015-06-10 11:40:00.000000";21.4300003051758
"2015-06-10 11:45:00.000000";22.8100004196167
"2015-06-10 11:50:00.000000";22.8100004196167
"2015-06-10 11:55:00.000000";22.8100004196167
"2015-06-10 12:00:00.000000";22.9250001907349

Thank you!

4
  • 1
    It would be great if you supply a test case with CREATE TABLE and INSERT statements for sample data and a hand-crafted "expected results" table (stackoverflow.com/tags/postgresql/info) Commented Jun 15, 2015 at 14:56
  • possible duplicate of PostgreSQL use value from previous row if missing Commented Jun 15, 2015 at 15:02
  • @Nicolai - Thank you! Added the information. Commented Jun 15, 2015 at 15:05
  • @Kristján - I read that post but my data is aggregated with averages that makes it a bit complicated (for me at least)... Commented Jun 15, 2015 at 15:07

1 Answer 1

2

The below query fills empty values in the resultset of your original query. The method consists in splitting the data into partitions according to the number of empty values and selecting the first (non-empty) value from each partition (add * to the select to see how it works).

WITH survey AS (

    SELECT grid.t5||'.000000' as ts, 
        CASE
            WHEN avg(t.sensor_value_raw) ISNULL THEN -1
            ELSE avg(t.sensor_value_raw) 
        END AS sensorvalue
    FROM (

          SELECT date_trunc('min', time_series) as t5 
          FROM generate_series('2015-06-10 11:00:00'::timestamp, '2015-06-11 12:00:00', 
                       '5 min') as time_series

       ) grid
    LEFT JOIN device_history_log t 
         ON t.last_update >= grid.t5 AND t.last_update <  grid.t5 +  interval '5 min' AND (t.deviceid = 493417852 or t.deviceid = 20865735)
    GROUP  BY grid.t5
    ORDER  BY grid.t5)

SELECT
    ts, first_value(sensorvalue) OVER (PARTITION BY part ORDER BY ts) sensorvalue
FROM (  
    SELECT *, sum((sensorvalue != -1)::int) OVER (ORDER BY ts) part
    FROM survey) alias
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @klin but I need the avg as there are more than one deviceid (temperature sensors) reporting back. I've updated the data to make it more explicit.
@Kostas: see changed answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.