3

Assuming I have this data in a table:

 id | thing | operation | timestamp
----+-------+-----------+-----------
  0 | foo   |       add |         0
  0 | bar   |       add |         1
  1 | baz   |    remove |         2
  1 | dim   |       add |         3
  0 | foo   |    remove |         4
  0 | dim   |       add |         5

Is there any way to construct a Postgres SQL query that will group by id and operation but without grouping rows with a higher timestamp value over those with lower? I want to get this out of the query:

 id |  things  | operation
----+----------+-----------
  0 | foo, bar |       add
  1 |      baz |    remove
  1 |      dim |       add
  0 |      foo |    remove
  0 |      dim |       add

Basically group by, but only over adjacent rows sorted by timestamp.

3
  • Data is not ordered, the rows may come in any order - in SQL there are no "adjacent rows". Commented Feb 17, 2015 at 11:00
  • If you want an ORDER, use ORDER BY. Otherwise there is no ORDER Commented Feb 17, 2015 at 11:02
  • I edited my question to add a timestamp column. Commented Feb 17, 2015 at 11:03

3 Answers 3

9

This is a gaps and islands problem (although this article is directed at SQL-Server it describes the problem very well so still applies to Postgresql) , and can be solved using ranking functions:

SELECT  id,
        thing,
        operation,
        timestamp,
        ROW_NUMBER() OVER(ORDER BY timestamp) - 
                ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet,
        ROW_NUMBER() OVER(ORDER BY timestamp) AS PositionInSet,
        ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS PositionInGroup
FROM    T
ORDER BY timestamp;

As you can see by taking the overall position within the set, and deducting the position in the group you can identify the islands, where each unique combination of (id, operation, groupingset) represents an island:

id  thing   operation   timestamp   groupingSet PositionInSet   PositionInGroup
0   foo     add         0           0           1               1
0   bar     add         1           0           2               2           
1   baz     remove      2           2           3               1
1   dim     add         3           3           4               1
0   foo     remove      4           4           5               1
0   dim     add         5           3           6               3

Then you just need to put this in a subquery, and group by the relevant fields, and use string_agg to concatenate your things:

SELECT  id, STRING_AGG(thing) AS things, operation
FROM    (   SELECT  id,
                    thing,
                    operation,
                    timestamp,
                    ROW_NUMBER() OVER(ORDER BY timestamp) - 
                            ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet
            FROM    T
        ) AS t
GROUP BY id, operation, groupingset;
Sign up to request clarification or add additional context in comments.

Comments

1

Perhaps this works, if your sample data is good enough:

select id, string_agg(thing,',') as things, operation
from tablename
group by id, operation

I.e. use id and operation to find things to concat.

Edited, now using string_agg instead of group_concat.

5 Comments

I'm sorry, the original data didn't include enough rows to show the problem. A simple group by doesn't work, because I don't want to group all things with the same keys, just things that happen to be next to each other in sort order. Kind of like the unix uniq commandline tool.
I don't think group_concat() is a standard PostgreSQL-function, looks like you need string_agg() or create a custom function.
@FrankHeikens, sorry, I can't always remember which dbms that supports those non-ANSI functions.
@Moshev, do it outside SQL, using a cursor.
group_concat is for MySQL
0

You can count distinct operations in group by id results and use this counter to union 2 selects to table:

WITH cnt AS (
  SELECT id, operations_cnt FROM (
    SELECT id, array_length(array_agg(DISTINCT operation),1) AS operations_cnt
    FROM test GROUP BY id
  ) AS t
  WHERE operations_cnt=1
)
SELECT id, string_agg(things, ','), operation, MAX(timestamp) AS timestamp
FROM test
WHERE id IN (SELECT id FROM cnt) GROUP BY id, operation
UNION ALL
SELECT id, things, operation, timestamp
FROM test
WHERE id NOT IN (SELECT id FROM cnt)
ORDER BY timestamp;

result:

 id | string_agg | operation | timestamp 
----+------------+-----------+-----------
  0 | foo,bar    | add       |         1
  1 | baz        | remove    |         2
  1 | dim        | add       |         3
  2 | foo        | remove    |         4
  2 | dim        | add       |         5
(5 rows)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.