2

The following query (tested with Postgresql 11.1) evaluates, for each customer/product combination, the following elements:

  • (A) the sum of sales value that the customer spent on this product
  • (B) the sum of sales value that the customer spent in the parent category of this product

And divides A / B to get to a metric called loyalty.

select
  pp.customer, pp.product, pp.category,
  pp.sales_product / pc.sales_category as loyalty
from (
    select
      t.household_key as customer,
      t.product_id as product,
      p.commodity as category,
      sum(t.sales_value) as sales_product
    from transaction_data t
    left join product p on p.product_id = t.product_id
    group by t.household_key, t.product_id, p.commodity
) pp
left join (
    select
      t.household_key as customer,
      p.commodity as category,
      sum(t.sales_value) as sales_category
    from transaction_data t
    left join product p on p.product_id = t.product_id
    group by t.household_key, p.commodity
) pc on pp.customer = pc.customer and pp.category = pc.category
;

Results are of this form:

customer      product    category     loyalty
---------------------------------------------
       1       tomato        food        0.01
       1         beef        food        0.02
       1   toothpaste     hygiene        0.04
       1   toothbrush     hygiene        0.03

My question is, instead of having to rely on two sub-queries which are then left-joined, would it be feasible with a single query using window functions instead?

I've tried to do something like the following, but obviously this doesn't work because, in this case, column "t.sales_value" must appear in the GROUP BY clause or be used in an aggregate function. I don't see what can be done to fix this.

-- does not work
select
  t.household_key as customer,
  t.product_id as product,
  p.commodity as category,
  sum(t.sales_value) as sales_product,
  sum(t.sales_value) over (partition by t.household_key, p.commodity) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity;

1 Answer 1

1

I don't know how to do this without using either a join or a subquery, but here is one way to do this with a subquery, using analytic functions:

WITH cte AS (
    SELECT
        t.household_key AS customer,
        t.product_id AS product,
        p.commodity as category,
        SUM(t.sales_value) OVER (PARTITION BY t.household_key, t.product_id, p.commodity)
            AS sales_product,
        SUM(t.sales_value) OVER (PARTITION BY t.household_key, p.commodity)
            AS sales_category
    FROM transaction_data t
    LEFT JOIN product p
        ON p.product_id = t.product_id
)

SELECT
    t.customer,
    t.product,
    t.category
    MAX(t.sales_product) / MAX(t.sales_category) AS loyalty
FROM cte
GROUP BY
    t.customer,
    t.product,
    t.category;

The trick here is to make a single pass over your joined tables, and use analytic sum to compute the aggregates you want, with two different partitions, one with 2 columns and the other with three columns. Then, we can aggregate by 3 columns and just arbitrarily take the max value of the aggregates for each group.

Sign up to request clarification or add additional context in comments.

1 Comment

It works but for some reason it seems slower than the original one. I'm digging in because there are some inconsistencies due to null values in the data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.