2

I'm using Postresql 9.6 and am trying to figure out how to create a query to calculate the sum of a column based on the discrete sum of a different column (both in the same table). For example, I want the sum of counts for each set of rows until the volume >= 100 and then to start calculating the sum of counts over again.

Example data:

id          count       volume
1           5           12
2           6           14
3           2           11
4           10          9
5           5           14
6           17          19
7           0           8
8           12          4
9           18          6
10          12          14
11          10          10
12          15          7
13          8           12
14          2           17
15          5           30
16          9           24
17          2           16.5

Hypothetical intermediate results (just to understand what I expect):

id         sum(count)  discrete volume
1          5           12
2          11          26
3          13          37
4          23          46
5          28          60
6          45          79          
7          45          87
8          57          91
9          75          97
10         87          111
11         10          10   (reset since volume >= 100)
12         25          17
13         33          29
14         35          46
15         40          76
16         49          100
17         2           16.5   (reset since volume >= 100)

Expected final results:

sum(count) discrete volume
87         111
49         100
2          16.5   (partial result, which is desired)

What I have tried so far: SQL Fiddle

I got idea for the current query shown in SQL Fiddle from another StackOverflow answer and this obviously isn't working correctly (however it is working quite nicely for a different problem I was trying to solve). I saw from a different StackOverflow answer that I probably want to be using a RECURSIVE query (PostgreSQL Documentation) but I can't figure out exactly how to write the query correctly so it works :(

I certainly can write this to work in Java code (and already have), but I want to do it in SQL so it is hopefully faster than reading all the rows and calculating the results. I can also most likely write a stored procedure to make this work, but I would rather like to avoid that as I'm using JPA (in this case with a native query) and want to keep all the code in the same place (the Java codebase). Also, I want to be able to dynamically include/exclude columns, so I would like to construct the query in Java code (the actual table has many more columns than my example).

Thanks for any help you can give.

EDIT:

Thanks to @klin's comment and looking at the referenced StackOverflow issue, I'm getting closer. Here is what I have (generates intermediate results):

WITH RECURSIVE WorkTable(id, count_sum, volume_sum) AS 
(
    SELECT 
        id,
        count AS count_sum,
        volume AS volume_sum
    FROM measurements
    WHERE id = 1
    UNION ALL
    SELECT
        measurements.id,
        CASE WHEN WorkTable.volume_sum >= 100
            THEN measurements.count
            ELSE WorkTable.count_sum + measurements.count
        END AS count_sum,
        CASE
            WHEN WorkTable.volume_sum >= 100
            THEN measurements.volume
            ELSE WorkTable.volume_sum + measurements.volume
        END AS discrete_sum_volume
        FROM measurements
        JOIN WorkTable
        ON measurements.id = WorkTable.id + 1
)
SELECT *
FROM WorkTable
ORDER BY id

However, what I'm still missing is how to get the final result. If I use WHERE volume_sum >= 100 I don't get the final (partial) result. And I can't use OR id = MAX(id) because Postgres doesn't allow that in a WHERE clause.

EDIT: BTW after all the work and time that @klin and @JorgeCampos spent looking at and answering my question, I found that this query doesn't perform for a table with millions of rows. I have created a stored procedure instead; I didn't want to have to go there but there seems to be no other well performing alternative. The stored procedure out-performs the RECURSIVE query by many orders of magnitude for a large table.

8
  • You mean until volume <= 100 right? Commented Apr 20, 2018 at 17:44
  • @JorgeCampos - I tried editing the question based on your question and it seemed to read much less clearly. The reset condition is volume >= 100 Commented Apr 20, 2018 at 17:55
  • There are two approaches to similar problems - iterative and recursive. Examples on both can be found in this thread Find Value per number based on Above row result. Commented Apr 20, 2018 at 18:19
  • @klin thanks for your response. the iterative approach seems to require a stored procedure which I'm trying to avoid (for the reason see the text of the question). I would like a recursive solution if it is possible (I'm not sure). I've been hacking on the recursion query for almost a solid day now, but so far nothing I try is working. Commented Apr 20, 2018 at 18:47
  • For your current attempt the solution would be to create a row number based on that case statement and get only the rows with number=1, however you are assuming for your query that there is no gap on the ids and I particularly disagree with it. Any delete or manual insert on that table would break your functionality entirely. Commented Apr 20, 2018 at 19:30

1 Answer 1

3

I have used row_count() as consecutive row numbers are needed. You should not rather rely on the primary key, because it usually may contain gaps.

with recursive cte as (
    select *, row_number() over (order by id)
    from measurements
),
work_table as (
    select 
        id,
        count as count_sum,
        volume as volume_sum,
        row_number
    from cte
    where row_number = 1
    union all
    select
        c.id,
        case when w.volume_sum >= 100
            then c.count
            else w.count_sum + c.count
        end as count_sum,
        case
            when w.volume_sum >= 100
            then c.volume
            else w.volume_sum + c.volume
        end as discrete_sum_volume,
        c.row_number
    from cte c
    join work_table w
    on c.row_number = w.row_number + 1
)
select count_sum, volume_sum
from work_table
where volume_sum >= 100
or id = (select max(id) from work_table)
order by id

Result:

 count_sum | volume_sum 
-----------+------------
        87 |        111
        49 |        100
         2 |       16.5
(3 rows)

SqlFiddle.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.