5

On PostgreSQL 9.4, I'm trying to achieve something I'll call "aggregate function" on some contiguous rows. Example:

Input data:

recipe  prod1   prod2   timestamp
0       5       4       2015-07-02 08:10:34.357
0       2       7       2015-07-02 08:13:45.352
0       7       0       2015-07-02 08:16:22.098
1       3       2       2015-07-02 08:22:14.678
1       9       4       2015-07-02 08:22:56.123
2       2       6       2015-07-02 08:26:37.564
2       1       7       2015-07-02 08:27:33.109
2       0       8       2015-07-02 08:31:11.687
0       3       5       2015-07-02 08:40:01.345
1       4       2       2015-07-02 08:42:23.210

Desired output:

recipe  prod1_sum   prod2_avg   timestamp_first             timestamp_last
0       14          3.6666      2015-07-02 08:10:34.357     2015-07-02 08:16:22.098
1       12          3           2015-07-02 08:22:14.678     2015-07-02 08:22:56.123
2       3           7           2015-07-02 08:26:37.564     2015-07-02 08:31:11.687
0       3           5           2015-07-02 08:40:01.345     2015-07-02 08:40:01.345
1       4           2           2015-07-02 08:42:23.210     2015-07-02 08:42:23.210

Basically, one output line for each "group" of contiguous rows (when table is sorted on timestamp column) with the same "recipe" value. In the output, prod1_sum is the sum of prod1 in the "group", prod2_avg is average of prod2 in the same "group", and the 2 last columns are respectively the first and the last timestamps in the group. Obviously there are several distinct groups with the same "recipe" value, and I do want an output row for each of them.

At the moment, I've an ugly way of obtaining this based on several requests and a lot of data processing outside of the DB, which I really want to avoid, and which is not worth showing.

My problem really is the "grouping" of rows. I know how to create an aggregate function which would do what I want, if I could apply it to each group individually. I have looked into windows functions, but it seems this would group all values by recipe, not conforming with the "contiguous rows" principle I need to respect.

1 Answer 1

8

You can use the following query:

SELECT recipe, SUM(prod1) AS prod1_sum,
       AVG(prod2) AS prod2_avg, 
       MIN(timestamp) AS timestamp_first, MAX(timestamp) AS timestamp_last
FROM (       
   SELECT recipe, prod1, prod2, timestamp,
          ROW_NUMBER() OVER (ORDER BY timestamp) 
          - 
          ROW_NUMBER() OVER (PARTITION BY recipe 
                             ORDER BY timestamp) AS grp
   FROM mytable ) t
GROUP BY recipe, grp
ORDER BY timestamp_first

The trick here is the usage of ROW_NUMBER window function to identify islands of continuous recipe values: grp calculated field does exactly this.

Demo here

Sign up to request clarification or add additional context in comments.

1 Comment

This is absolutely awesome and spot on! The way you wrote it make it crystal clear. I will dig in the documentation to better understand the details (ROW_NUMBER() OVER, the "t" after FROM mytable, ...), but I'm on the rails now. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.