Postgres SQL GROUP BY without jumping rows?

Question

Assuming I have this data in a table:

 id | thing | operation | timestamp
----+-------+-----------+-----------
  0 | foo   |       add |         0
  0 | bar   |       add |         1
  1 | baz   |    remove |         2
  1 | dim   |       add |         3
  0 | foo   |    remove |         4
  0 | dim   |       add |         5

Is there any way to construct a Postgres SQL query that will group by id and operation but without grouping rows with a higher timestamp value over those with lower? I want to get this out of the query:

 id |  things  | operation
----+----------+-----------
  0 | foo, bar |       add
  1 |      baz |    remove
  1 |      dim |       add
  0 |      foo |    remove
  0 |      dim |       add

Basically group by, but only over adjacent rows sorted by timestamp.

Data is not ordered, the rows may come in any order - in SQL there are no "adjacent rows". — jarlh
– jarlh, Commented Feb 17, 2015 at 11:00
If you want an ORDER, use ORDER BY. Otherwise there is no ORDER — Frank Heikens
– Frank Heikens, Commented Feb 17, 2015 at 11:02

GarethD · Accepted Answer · 2015-02-17 16:19:00Z

This is a gaps and islands problem (although this article is directed at SQL-Server it describes the problem very well so still applies to Postgresql) , and can be solved using ranking functions:

SELECT  id,
        thing,
        operation,
        timestamp,
        ROW_NUMBER() OVER(ORDER BY timestamp) - 
                ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet,
        ROW_NUMBER() OVER(ORDER BY timestamp) AS PositionInSet,
        ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS PositionInGroup
FROM    T
ORDER BY timestamp;

As you can see by taking the overall position within the set, and deducting the position in the group you can identify the islands, where each unique combination of (id, operation, groupingset) represents an island:

id  thing   operation   timestamp   groupingSet PositionInSet   PositionInGroup
0   foo     add         0           0           1               1
0   bar     add         1           0           2               2           
1   baz     remove      2           2           3               1
1   dim     add         3           3           4               1
0   foo     remove      4           4           5               1
0   dim     add         5           3           6               3

Then you just need to put this in a subquery, and group by the relevant fields, and use string_agg to concatenate your things:

SELECT  id, STRING_AGG(thing) AS things, operation
FROM    (   SELECT  id,
                    thing,
                    operation,
                    timestamp,
                    ROW_NUMBER() OVER(ORDER BY timestamp) - 
                            ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet
            FROM    T
        ) AS t
GROUP BY id, operation, groupingset;

jarlh · Accepted Answer · 2015-02-17 11:21:47Z

1

Perhaps this works, if your sample data is good enough:

select id, string_agg(thing,',') as things, operation
from tablename
group by id, operation

I.e. use id and operation to find things to concat.

Edited, now using string_agg instead of group_concat.

edited Feb 17, 2015 at 11:21

answered Feb 17, 2015 at 11:02

jarlh

44.9k8 gold badges52 silver badges68 bronze badges

5 Comments

Moshev Over a year ago

I'm sorry, the original data didn't include enough rows to show the problem. A simple group by doesn't work, because I don't want to group all things with the same keys, just things that happen to be next to each other in sort order. Kind of like the unix uniq commandline tool.

Frank Heikens Over a year ago

I don't think group_concat() is a standard PostgreSQL-function, looks like you need string_agg() or create a custom function.

jarlh Over a year ago

@FrankHeikens, sorry, I can't always remember which dbms that supports those non-ANSI functions.

jarlh Over a year ago

@Moshev, do it outside SQL, using a cursor.

Madhivanan Over a year ago

group_concat is for MySQL

ndpu · Accepted Answer · 2015-02-17 14:40:48Z

You can count distinct operations in group by id results and use this counter to union 2 selects to table:

WITH cnt AS (
  SELECT id, operations_cnt FROM (
    SELECT id, array_length(array_agg(DISTINCT operation),1) AS operations_cnt
    FROM test GROUP BY id
  ) AS t
  WHERE operations_cnt=1
)
SELECT id, string_agg(things, ','), operation, MAX(timestamp) AS timestamp
FROM test
WHERE id IN (SELECT id FROM cnt) GROUP BY id, operation
UNION ALL
SELECT id, things, operation, timestamp
FROM test
WHERE id NOT IN (SELECT id FROM cnt)
ORDER BY timestamp;

result:

 id | string_agg | operation | timestamp 
----+------------+-----------+-----------
  0 | foo,bar    | add       |         1
  1 | baz        | remove    |         2
  1 | dim        | add       |         3
  2 | foo        | remove    |         4
  2 | dim        | add       |         5
(5 rows)

Collectives™ on Stack Overflow

Postgres SQL GROUP BY without jumping rows?

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related