1

I have written a small PostgreSQL query that helps me total amount of jobs executed per hourly intervals in every day within two certain dates -e.g. all the jobs executed between February 2, 2012 and March 3, 2012 hour by hour starting with the hour given in February 2 and ending with the hour given in March 3- I have noticed that this query doesn't print the rows with 0 count -no job executed within that time interval e.g. at February 21, 2012 between 5 and 6pm-. How can I make this also return results(rows) with 0 count? The code is as below:

SELECT date_trunc('hour', executiontime), count(executiontime)
  FROM mytable
 WHERE executiontime BETWEEN '2011-2-2 0:00:00' AND '2012-3-2 5:00:00' 
 GROUP BY date_trunc('hour', executiontime)
 ORDER BY date_trunc('hour', executiontime) ASC;

Thanks in advance.

1
  • You'll need a numbers table to generate rows for each hour, and then outer join to the mytable to get the count. Commented Aug 1, 2012 at 14:19

2 Answers 2

1
        -- CTE to the rescue!!!
WITH cal AS (
        SELECT generate_series('2012-02-02 00:00:00'::timestamp , '2012-03-02 05:00:00'::timestamp , '1 hour'::interval) AS stamp
        )
, qqq AS (
        SELECT date_trunc('hour', executiontime) AS stamp
        , count(*) AS zcount
        FROM mytable
        GROUP BY date_trunc('hour', executiontime)
        )
SELECT cal.stamp
        , COALESCE (qqq.zcount, 0) AS zcount
FROM cal
LEFT JOIN qqq ON cal.stamp = qqq.stamp
ORDER BY stamp ASC
        ;
Sign up to request clarification or add additional context in comments.

15 Comments

This gives the following syntax error ERROR: syntax error at or near "WITH cal" LINE 8: WITH cal AS ( despite being the same as the CTE examples in the PostgreSQL documentation for some reason.
I have 8.3, that should be the problem. Thanks.
Standard trick is to replace the CTE's by (temporary) views and refer to these. But it will lose a lot of its elegance that way. BTW: I would suggest to upgrade to 9.1.x. It's only an hour of work or so, and it is worth it.
Then I'll install 9.1.x and try out the answer. By the way, is this suitable for very large scale data -because I'm looking at ~2.5 million total queries which are ~10.5k rows (and this number may increase) if I retrieve them hour by hour every single day- and my initial code retrieves the entire data (a slightly longer timespan than the datetime parameters given in my question, the timestamps are parametric and any two data intervals can be chosen with user input from the Qt front-end.- in about 5 seconds.
Don't forget to backup (you need to restore from backup to do a migration!) Performance is not an issue; the planner optimiser treats views and CTEs as part of the main query, and reshuffles them happily, as if they were ordinary subqueries. With the correct structure and tuning subsecond timing would be possible for (MegaRow * KiloRow) joins. BTW: 1MRow is not large; that amount of data will still fit in the cache, so a typical query in a "warm" database would need virtually no disk-IO)
|
0

Look this. Idea is to generate array or table with dates in this period and join with job execution table.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.