PostgreSQL Group By Sum

Question

I've been scratching my head on this problem in PostgreSQL. I have a table test with 2 columns: - id and content. e.g.

create table test (id integer, 
                   content varchar(1024));

insert into test (id, content) values 
    (1, 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'),
    (2, 'Lorem Ipsum has been the industrys standard dummy text '),
    (3, 'ever since the 1500s, when an unknown printer took a galley of type and scrambled it to'),
    (4, 'make a type specimen book.'),
    (5, 'It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.'),
    (6, 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem '),
    (7, 'Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker'),
    (8, ' including versions of Lorem Ipsum.');

If I run the following query ...

select id, length(content) as characters from test order by id

... then I get: -

id | characters
---+-----------
 1 |         74
 2 |         55
 3 |         87
 4 |         26
 5 |        120
 6 |         85
 7 |         87
 8 |         35

What I want to do is group the id into rows where the sum of the content goes over a threshold. For example, if that threshold is 100 then the desired result would look like the following: -

ids | characters
----+-----------   
1,2 |        129
3,4 |        113    
5   |        120
6,7 |        172    
8   |         35

NOTE (1): - The query doesn't need to generate a characters column - just the ids - they are here to communicate that they are all over 100 - except for the last row which is 35.

NOTE (2): - ids could be a comma-delimited string or a PostgreSQL array - the type is less important than the values

Can I use a window function to do this or do I need something more complex like a lateral join?

The answer to your question is that you need something even more complicated: recursive CTEs. The performance will not be particularly good. — Gordon Linoff
– Gordon Linoff, Commented Nov 9, 2016 at 12:51
I've accepted @Abelisto answer as this is the answer I'm using in my code. — bobmarksie
– bobmarksie, Commented Nov 9, 2016 at 15:37
However, really impressed with @Gordon's answer as I learned a lot just by trying to understand it. Thanks everyone! — bobmarksie
– bobmarksie, Commented Nov 9, 2016 at 15:38

Gordon Linoff · Accepted Answer · 2016-11-09 12:46:34Z

5

This type of problem requires a recursive CTE (or similar functionality). Here is an example:

with recursive t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 100 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 100 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum
             (case when cte.cumelen >= 100 then cte.grp + 1 else cte.grp end) as ids,
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

Here is a small working example:

with recursive test as (
      select 1 as id, 'abcd'::text as content union all
      select 2 as id, 'abcd'::text as content union all
      select 3 as id, 'abcd'::text as content 
     ),
     t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, cumelen, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 5 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 5 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum::int,
             (case when cte.cumelen >= 5 then cte.grp + 1 else cte.grp end)
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

edited Nov 9, 2016 at 12:46

answered Nov 9, 2016 at 12:23

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Thomas G Over a year ago

It's not what I understand about the question. He wants to SUM consecutive values until the sum value reaches a certain treshold and then make a break and continue the sum with the next consecutive values. I am also scratching my head. Not sure if a solution exist in pure SQL without a procedure

Thomas G Over a year ago

you're damn fast... I thought to that as well and was coding a recursive CTE on SQLfiddle. Unbeatable Gordon :)

David דודו Markovitz Over a year ago

The aggregation can be skipped by marking the rows of the last elements of the groups (where accumulated sum > 100) and filtering by it.

Gordon Linoff Over a year ago

@DuduMarkovitz . . . Given how slow the recursive CTE is likely to be, that might not be a useful optimization.

Clodoaldo Neto Over a year ago

When I use the first query with a created table I get: ERROR: recursive query "t" does not have the form non-recursive-term UNION [ALL] recursive-term SQL Fiddle

|

Abelisto · Accepted Answer · 2016-11-12 16:35:30Z

2

Using stored functions allows to avoid (sometime) the head-breaking queries.

create or replace function fn_foo(ids out int[], characters out int) returns setof record language plpgsql as $$
declare
  r record;
  threshold int := 100;
begin
  ids := '{}'; characters := 0;
  for r in (
    select id, coalesce(length(content),0) as lng
    from test order by id)
  loop
    characters := characters + r.lng;
    ids := ids || r.id;
    if characters > threshold then
      return next;
      ids := '{}'; characters := 0;
    end if;
  end loop;
  if ids <> '{}' then
    return next;
  end if;
end $$;

select * from fn_foo();

╔═══════╤════════════╗
║  ids  │ characters ║
╠═══════╪════════════╣
║ {1,2} │        129 ║
║ {3,4} │        113 ║
║ {5}   │        120 ║
║ {6,7} │        172 ║
║ {8}   │         35 ║
╚═══════╧════════════╝
(5 rows)

edited Nov 12, 2016 at 16:35

answered Nov 9, 2016 at 13:19

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

1 Comment

bobmarksie Over a year ago

Excellent, this works! Assuming the performance would be quite good too as it's simply looping over. I also added a threshold integer default 100, parameter at the start so I can override the threshold!

kometen · Accepted Answer · 2019-03-20 08:20:57Z

Here I have a query which uses the LEAD() window function

SELECT id || ',' || next_id, characters + next_characters total_characters 
FROM  (SELECT id, characters, row_num, 
              CASE 
                WHEN row_num % 2 = 0 
                     AND characters < 100 THEN Lead(id) OVER(ORDER BY id) 
                ELSE NULL 
              END next_id, 
              CASE 
                WHEN row_num % 2 = 0 
                     AND characters < 100 THEN NULL 
                ELSE Lead(characters) OVER(ORDER BY id) 
              END AS next_characters 
       FROM  (SELECT id, 
                     Length(content)  AS characters, 
                     Row_number() 
                       OVER( 
                         ORDER BY id) row_num 
              FROM   test 
              ORDER  BY id)) 
WHERE  next_id IS NULL;

Hope this may help you.

Collectives™ on Stack Overflow

PostgreSQL Group By Sum

3 Answers 3

11 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related