0

I would like to create a graph for an application based on data in a postgresql DB. Therefore, I would like to create the average of the last X hours (e.g. 2 hours) of my value for a variable timespan (e.g. Every 10 minutes) for a total timeframe of Y hours (e.g. 8 hours).

Image: https://i.ibb.co/C8v1mXD/Bildschirmfoto-2019-09-03-um-11-52-51.png

My postgreSQL DB has a, id, a value and a timestamp column. I tried a lot to work with "group by" and "over" but unfortunately I did not achieve my goal. Maybe some of you are so nice and able to help me?

Image: https://i.ibb.co/sKpYCbJ/Bildschirmfoto-2019-09-03-um-12-03-42.png

1
  • Please provide proper table definition (CREATE TABLE statement), minimal sample data (INSERT statement) and desired result as text. And show what you tried, even if it's not working. Commented Sep 3, 2019 at 10:52

2 Answers 2

1

If you're on PG11+ then ranged window functions may help you:

SELECT  
  avg(t.average_me) OVER(ORDER BY t.timestamp_col RANGE BETWEEN INTERVAL '3 hour' PRECEDING AND CURRENT ROW) as a 
FROM yourtable t;

If you have rows with a timestamp_col then for every row R this will calculate the average of the average_me for all rows between R's timestamp_col and a date 10 hours before it. You can move the window too:

SELECT  
  avg(t.average_me) OVER(ORDER BY t.timestamp_col RANGE BETWEEN INTERVAL '3 hour' PRECEDING AND INTERVAL '2 hour' PRECEDING) as a 
FROM yourtable t;

This will calc, for a row R having a timestamp_col of 2000-01-01 12:00:00, the average of all rows whose timestamp_col is between 2000-01-01 9:00:00 and 2000-01-01 10:00:00

Update after my comment (untested):

SELECT x.* FROM(

 SELECT  
  CASE WHEN kind = avgpoint' THEN 
    avg(t.average_me) OVER(ORDER BY t.timestamp_col RANGE BETWEEN INTERVAL '2 hour' PRECEDING AND INTERVAL '1 hour' PRECEDING)
  END as a 
 FROM 
 (
  --your data
  SELECT 'datarow' as kind, average_me, timestamp_col 
  FROM yourtable;

  UNION ALL

  --your checkpoints, every 15 minutes from 10h ago to now
  SELECT 'avgpoint', null, g.v 
  FROM generate_series(
    now()-'10 hours'::interval, 
    now(),
    '15 minute'::interval
  ) as g(v)
 ) t
) x
WHERE x.kind = 'avgpoint'

It inserts a bunch of 15 minute intervals into the data stream, with a different kind(so it can be detected). For every 'avgpoint' kind row the AVG()OVER() looks back at the data between 2 hours and 1 hour ago and averages it. This maens that every 15 minutes you get the previous previous hour average: at noon, you get the avg from 10am to 11am. At 12:15pm you get 10:15 to 11:15 etc

Sign up to request clarification or add additional context in comments.

6 Comments

Amazing code, exactly what I wanted! Is there a way to execute this code not for all rows, but for rows each X minutes?
I didn't really understand your "not for all rows but for each x minutes" - do you mean to use a where clause? When I said "for every row" I meant every row in the result set, but If you only want it for rows where the "productype = 'car'" put that as a where clause. If your AVG()OVER() is in the same sql as the WHERE then the where filters first and later the average is done. on the filtered set# If you wrap the SELECT containing the AVG in an outer query, then the AVG will consider all rows, but then the filtering to only cars will take place. Which you choose depends what you want to averag
Thanks for all the input. With "not for all rows but for each x minutes" I meant that I do not want to calculate the average for each available row but just for each Xth row defined by a timeframe. So doing the exact calculation like in your post but just affecting rows with a time difference of X minutes to the last row (Setting an interval for the graph). I hope this was explained well enough to understand :)
Do you always have a row available at that time eg do rows always generate every one minute and you want every 15 minute row (:00, :15, :30, :45) to have the average of,say,the last two hours? If rows aren't reliably available we will have to generate some you see. If they are available we can just use them based on some criteria (case when extract minute from timestampcolumn in (0,15,30,45))
Alright I understand. Its data from a machine and if it is not producing there are no datapoints unfortunately. So probably I have to create a series then with generate series somehow?
|
0

What do you think about this :

with myFictiveTable as (
 select row_number() over () id ,* from (
    select generate_series(now()-'3 days'::interval,now(),'10 minutes'::interval) "timestamp"
           ,(random()*100)::int ftz_1
   ) a
  )
 select t0,t1,avg(ftz_1) from (
      select t0,COALESCE(lead(t0) over (ORDER BY t0),now()) t1 from (
    select generate_series(now()-'8 hours'::interval,now(),'2 hours'::interval) t0
     )  a
    ) mytimeLaps
   join myFictiveTable on ("timestamp">=t0 and "timestamp"<t1)
   group by t0,t1

2 Comments

You are a genious! The comparison with the timepan works perfectly fine. But how can I insert now the real ftz_1 values instead of the random created ones?
the block "with" is just there to create a sample table, delete the block, start on the "select t0, t1, avg (ftz_1) from ..." and replace "myFictiveTable" by the name of your table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.