Grouping of PostgreSQL data

Question

I have a postgresql table which has events recorded by date/time. The table has the columns id, event and timestamp.

My output has to be something like this:

'Day', '1st Timers', '2nd Timers', '3rd Timers', '3+ Timers'

1st timers are all ids that have done the event for the first time. 2nd timers are all ids that have done the event for the second time. etc. etc.

Is this possible using a single SQL query?

edit: Sample data and output as per request

user_id date                event
1       09/03/15 14:08      opened
2      10/03/15 14:08       opened
1      11/03/15 14:08       opened
4      14/03/15 14:08       opened
1      15/03/15 14:08       opened
5      16/03/15 14:08       opened
1      17/03/15 14:08       opened
4      17/03/15 14:08       opened
6      18/03/15 14:08       opened
1      18/03/15 14:08       opened
6      18/03/15 14:08       other


Output (for event=opened)
date        1time   2times  3times  4times  5times
09/03/15    1       0       0       0       0
10/03/15    1       0       0       0       0
11/03/15    0       1       0       0       0
14/03/15    1       0       0       0       0
15/03/15    0       0       1       0       0
16/03/15    1       0       0       0       0
17/03/15    0       1       0       1       0
18/03/15    1       0       0       0       1

As always, your version of Postgres, please. It's relevant for the best solution. — Erwin Brandstetter
– Erwin Brandstetter, Commented Apr 9, 2015 at 12:34
If a user does an event two times on his/her first day, does (s)he count as "1st-timer" and "2nd-timer"? — Erwin Brandstetter
– Erwin Brandstetter, Commented Apr 9, 2015 at 12:42

Gordon Linoff · Accepted Answer · 2015-04-09 23:06:55Z

4

For each date, you seem to want to count the number of users that hit 1 time, 2 times, and so on. I see this as a row_number() followed by conditional aggregation:

select thedate,
       sum(case when seqnum = 1 then 1 else 0 end) as time_1,
       sum(case when seqnum = 2 then 1 else 0 end) as time_2,
       sum(case when seqnum = 3 then 1 else 0 end) as time_3,
       sum(case when seqnum = 4 then 1 else 0 end) as time_4,
       sum(case when seqnum = 5 then 1 else 0 end) as time_5
from (select t.*, date_trunc('day', date) as thedate
             row_number() over (partition by user_id order by date_trunc('day', date)) as seqnum
      from table t
      where event = 'opened'
     ) t
group by thedate
order by thedate;

edited Apr 9, 2015 at 23:06

answered Apr 9, 2015 at 11:46

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

void Over a year ago

Clever use of case an sum with window functions

Anoop Over a year ago

Awesome! Seems to be almost working.. The output I get is this pastebin.com/UyDYQ2pr. I believe the correct output for column2 is (column2 - column1). Some slight tweak is required, but not able to pin point.

Gordon Linoff Over a year ago

@Anoop . . . I think you just want where event = 'opened'.

Anoop Over a year ago

Correct. I figured it out. Accepting the answer.

Community · Accepted Answer · 2017-05-23 12:06:20Z

2

Aggregate `FILTER`

Starting with Postgres 9.4 use the new aggregate FILTER clause:

SELECT event_time::date
     , count(*) FILTER (WHERE rn = 1) AS times_1
     , count(*) FILTER (WHERE rn = 2) AS times_2
     , count(*) FILTER (WHERE rn = 3) AS times_3
    -- etc.
from (
   SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn
   FROM   tbl
   ) t
GROUP  BY 1
ORDER  BY 1;

How can I simplify this game statistics query?

About the cast event_time::date:

How to get the date and time from timestamp in PostgreSQL select query?

Crosstab

Or use an actual crosstab query (faster). Available for any modern Postgres version. Read this first:

PostgreSQL Crosstab Query

SELECT * FROM crosstab(
       'SELECT event_time::date, rn, count(*)::int AS ct
        FROM  (
           SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY event_time) AS rn
           FROM   tbl
           ) t
        GROUP  BY 1, 2
        ORDER  BY 1'

      ,$$SELECT * FROM unnest ('{1,2,3}'::int[])$$
   ) AS ct (day date, times_1 int, times_2 int, times_3 int);

edited May 23, 2017 at 12:06

CommunityBot

11 silver badge

answered Apr 9, 2015 at 12:45

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

4 Comments

Anoop Over a year ago

Thanks will try. That was just sample data I quickly cooked up, the actual field name is 'event_time'.

Anoop Over a year ago

I'm using Amazon RedShift. I believe the crosstab and filter are not supported. :(

Erwin Brandstetter Over a year ago

@Anoop: I believe that is something you should have told us up front. Redshift is not Postgres. I did ask for the version, too ...

Anoop Over a year ago

Sorry about that. Very new to RedShift. I believed that the underlying DB was postgres without any differences.

Collectives™ on Stack Overflow

Grouping of PostgreSQL data

2 Answers 2

4 Comments

Aggregate `FILTER`

Crosstab

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Aggregate FILTER

Crosstab

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Aggregate `FILTER`