Looping through unique dates in PostgreSQL

Question

In Python (pandas) I read from my database and then I use a pivot table to aggregate data each day. The raw data I am working on is about 2 million rows per day and it is per person and per 30 minutes. I am aggregating it to be daily instead so it is a lot smaller for visualization.

So in pandas, I would read each date into memory and aggregate it and then load it into a fresh table in postgres.

How can I do this directly in postgres? Can I loop through each unique report_date in my table, groupby, and then append it to another table? I am assuming doing it in postgres would be fast compared to reading it over a network in python, writing a temporary .csv file, and then writing it again over the network.

redneb · Accepted Answer · 2016-09-27 14:50:04Z

Here's an example: Suppose that you have a table

CREATE TABLE post (
    posted_at timestamptz not null,
    user_id integer not null,
    score integer not null
);

representing the score various user have earned from posts they made in SO like forum. Then the following query

SELECT user_id, posted_at::date AS day, sum(score) AS score
FROM post
GROUP BY user_id, posted_at::date;

will aggregate the scores per user per day.

Note that this will consider that the day changes at 00:00 UTC (like SO does). If you want a different time, say midnight Paris time, then you can do it like so:

SELECT user_id, (posted_at AT TIME ZONE 'Europe/Paris')::date AS day, sum(score) AS score
FROM post
GROUP BY user_id, (posted_at AT TIME ZONE 'Europe/Paris')::date;

To have good performace for the above queries, you might want to create a (computed) index on (user_id, posted_at::date), or similarly for the second case.

Collectives™ on Stack Overflow

Looping through unique dates in PostgreSQL

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related