I have a dataset, part of which looks like this:
| customer | product | price | quantity | sale_time |
|---|---|---|---|---|
| C060235 | P0204 | 6.99 | 2 | 2024-03-11 08:24:11 |
| C045298 | P0167 | 14.99 | 1 | 2024-03-11 08:35:06 |
| ... | ||||
| C039877 | P0024 | 126.95 | 1 | 2024-09-30 21:18:45 |
What I want is a list of unique customer, product pairs with the total sales, so something like:
| customer | product | total |
|---|---|---|
| C0000105 | P0168 | 643.78 |
| C0000105 | P0204 | 76.88 |
| ... | ||
| C1029871 | P1680 | 435.44 |
Here's my attempt at constructing this. This gives me the grand total of all sales, which isn't what I want. What's a correct approach?
import polars as pl
db.select(
(
pl.col('customer'),
pl.col('product'),
pl.col('quantity').mul(pl.col('price')).alias('total')
)
).group_by(('customer', 'product'))