The following query (tested with Postgresql 11.1) evaluates, for each customer/product combination, the following elements:
- (A) the sum of sales value that the customer spent on this product
- (B) the sum of sales value that the customer spent in the parent category of this product
And divides A / B to get to a metric called loyalty.
select
pp.customer, pp.product, pp.category,
pp.sales_product / pc.sales_category as loyalty
from (
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity
) pp
left join (
select
t.household_key as customer,
p.commodity as category,
sum(t.sales_value) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, p.commodity
) pc on pp.customer = pc.customer and pp.category = pc.category
;
Results are of this form:
customer product category loyalty
---------------------------------------------
1 tomato food 0.01
1 beef food 0.02
1 toothpaste hygiene 0.04
1 toothbrush hygiene 0.03
My question is, instead of having to rely on two sub-queries which are then left-joined, would it be feasible with a single query using window functions instead?
I've tried to do something like the following, but obviously this doesn't work because, in this case, column "t.sales_value" must appear in the GROUP BY clause or be used in an aggregate function. I don't see what can be done to fix this.
-- does not work
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product,
sum(t.sales_value) over (partition by t.household_key, p.commodity) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity;