2

I have a table of product/date transaction records and I would like to generate a summary table of monthly average product category revenue. The result would look like this:

┌───────────────────┬────────────────────────┐
│ product_category  │ avg_monthly_revenue    │
├───────────────────┼────────────────────────┤
│ 5000              │       65003.04         │
│ 5100              │        3301.95         │
│ 5200              │       99029.00         │
...

I'm looking for the average of the category totals, and my naive attempt is returning average of the transaction revenue per month.

What I want is the category monthly totals, and then the average of those months.

I can do this with a subquery aggregate of sum() and then avg() functions, however I think this can be accomplished with a clean little select using Postgres SQL Window Functions. Strangely, extensive googling has not yielded an applicable SO solution so I'm posting a question.

Here is how the data is organized:

=#> \dt+ customer_transactions
        Table "customer_transactions"
┌─────────────────┬────────────────┬───────────┐
│       Column    │      Type      │ Modifiers │
├─────────────────┼────────────────┼───────────┤
│ order_day       │ date           │ not null  │
│ product_id      │ text           │ not null  │
│ product_category│ text           │ not null  │
│ customer_id     │ bigint         │ not null  │
│ revenue         │ numeric(38,14) │           │
│ units           │ integer        │           │
└─────────────────┴────────────────┴───────────┘

=#> select * from customer_transactions limit 2;
order_day   product_id   product_category   customer_id revenue
24-MAR-16   A000BC2351   5000               44502       5.85
02-NOV-16   A000CB0182   5200               99833       99.50
...

Here is my naive attempt using window functions:

/* this doesn't work right, it generates the monthly transaction average*/
select      distinct product_category
            , avg(revenue) over (partition by date_trunc('month', order_day), product_category)  avg_cat_month_revenue

from        customer_transactions

# result accurate but not desired monthly item-cust-category-trxn average:
┌──────────────────┬─────────────────────────┐
│ product_category │   avg_cat_month_revenue │
├──────────────────┼─────────────────────────┤
│ 5000             │       12.0143           │
│ 5000             │       12.4989           │
...
│ 5100             │       13.5472           │
│ 5100             │       13.5643           │
...
1
  • i feel that this it the average per customer order amount, what if you used select distinct product_category, SUM(revenue) as sales, date_trunc('month', order_day) as date from customer_transactions group by product_category,date order by date DESC; Commented Mar 27, 2017 at 21:26

1 Answer 1

3

I was able to intuit the solution with help from the postgres documentation on window functions, the groups will apply to the inner aggregate function and the window is used on the result of that:

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after regular aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

select      distinct product_category
            , avg(sum(revenue)) over (partition by product_category)  avg_rev

from        customer_transactions

group by    date_trunc('month', order_day), product_category

The clever part is partitioning only over the product_category, not selecting all the groups (category only) and knowing that the groups will apply only to the sum and not the average.

The result checks out compared to manually pivoting and averaging in excel.

caveats:

  • use nullif(revenue,0): where the monthly sum is 0, the average is dinged with a higher denominator. use nullif in your summand if your table has zeros for transaction revenue.
  • you can't just add another avg accross grain to get the average among categories by month for example, because the select groups around category.
Sign up to request clarification or add additional context in comments.

2 Comments

Is it necessary to have both DISTINCT and GROUP BY? I've had no success grouping.
@CharlieClark Window functions usually return many duplicate rows, the result being attached to the records being windowed. Thats why the distinct is there. The group by is operating on the sum(revenue) across all records. Its a pretty subtle one here, did you try an example?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.