Cumulative Sum Query in SQL table with distinct elements

Question

I have a table like this, with column names as Date of Sale and insurance Salesman Names -

Date of Sale | Salesman Name | Sale Amount
2021-03-01   | Jack          | 40  
2021-03-02   | Mark          | 60
2021-03-03   | Sam           | 30 
2021-03-03   | Mark          | 70 
2021-03-02   | Sam           | 100

I want to do a group by, using the date of sale. The next column should display the cumulative count of the sellers who have made the sale till that date. But same sellers shouldn't be considered again.

For example, The following table is incorrect,

Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01   | 1                    | 40
2021-03-02   | 3                    | 200
2021-03-03   | 5                    | 300

The following table is correct,

Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01   | 1                    | 40
2021-03-02   | 3                    | 200
2021-03-03   | 3                    | 300

I am not sure how to frame the SQL query, because there are two conditions involved here, cumulative count while ignoring the duplicates. I think the OVER clause along with the unbounded row preceding may be of some use here? Request your help

Edit - I have added the Sale Amount as a column. I need the cumulative sum for the Sales Amount also. But in this case , all the sale amounts should be considered unlike the salesman name case where only unique names were being considered.

Please specify the dbms your are using. Different vendors support different features. — NineBerry
– NineBerry, Commented May 1, 2021 at 9:57

Gordon Linoff · Accepted Answer · 2021-05-01 12:01:41Z

2

The best way to do this uses window functions to determine the first time a sales person appears. Then, you just want cumulative sums:

select saledate,
       sum(case when seqnum = 1 then 1 else 0 end) over (order by saledate) as num_salespersons,
       sum(sum(sales)) over (order by saledate) as running_sales
from (select t.*,
             row_number() over (partition by salesperson order by saledate) as seqnum
      from t
     ) t
group by saledate
order by saledate;

Note that this in addition to being more concise, this should have much, much better performance than a solution that uses a self-join.

answered May 1, 2021 at 12:01

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dawson Smith Over a year ago

Hey, Thanks for this answer Gordon. Provides a completely new way of solving a complex query. Any good resources for learning about window functions?

Gordon Linoff Over a year ago

I would suggest that you start with the documentation for the database you are using.

Tim Biegeleisen · Accepted Answer · 2021-05-01 10:58:00Z

1

One approach uses a self join and aggregation:

WITH cte AS (
    SELECT t1.SaleDate,
           COUNT(CASE WHEN t2.Salesman IS NULL THEN 1 END) AS cnt,
           SUM(t1.SaleAmount) AS amt
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t2.Salesman = t1.Saleman AND
           t2.SaleDate < t1.SaleDate
    GROUP BY t1.SaleDate
)

SELECT
    SaleDate,
    SUM(cnt) OVER (ORDER BY SaleDate) AS NumSalesman,
    SUM(amt) OVER (ORDER BY SaleDate) AS TotalAmount
FROM cte
ORDER BY SaleDate;

The logic in the CTE is that we try to find, for each salesman, an earlier record for the same salesman. If we can't find such a record, then we assume the record in question is the first appearance. Then we aggregate by date to get the counts per day, and finally take a rolling sum of counts in the outer query.

edited May 1, 2021 at 10:58

answered May 1, 2021 at 8:33

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

18 Comments

Dawson Smith Over a year ago

Hey Tim, thanks for the answer. I have added one more column in my table. Can you suggest the changes that are required to be made in your query, for incorporating the sales amount column

Dawson Smith Over a year ago

Hey, will this ignore duplicate salesman names , while not ignoring the any sales amount rows?

Dawson Smith Over a year ago

Sure, will accept the answer, once the issue is resolved

Dawson Smith Over a year ago

Will amounts such as 70 and 100, in my table not be ignored?

Tim Biegeleisen Over a year ago

You should first try my updated answer and then ask questions about it afterwards.

|

Collectives™ on Stack Overflow

Cumulative Sum Query in SQL table with distinct elements

2 Answers 2

2 Comments

18 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

18 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related