1

I have a pandas dataframe where I want to sum the values between certain irregularly spaced dates. So for example, I have something like this:

date widgets
2021-03-01 1
2021-03-02 0
2021-03-03 1
2021-03-04 3
2021-03-05 1
2021-03-06 2

And I know that the cutoff dates for the sums are 2021-03-01, 2021-03-04, 2021-03-05 so that I want the sum of all widgets produced on the cutoff date and up to (but not including) the next cutoff date. All dates that are not cutoff dates would have a value of 0.

This would produce a new column like below.

date widgets sums
2021-03-01 1 2
2021-03-02 0 0
2021-03-03 1 0
2021-03-04 3 3
2021-03-05 1 3
2021-03-06 2 0

How can I do this within pandas?

1 Answer 1

2

you can try with:

df['date'] = pd.to_datetime(df['date']) #convert to datetime

Then check for occurance of cutoffs and cumsum and use them as grouper column:

cutoffs= ['2021-03-01', '2021-03-04', '2021-03-05']
cond = df['date'].isin(pd.to_datetime(cutoffs))
df['sums'] = df.groupby(cond.cumsum())['widgets'].transform('sum').where(cond,0)

print(df)

        date  widgets    sums
0 2021-03-01        1       2
1 2021-03-02        0       0
2 2021-03-03        1       0
3 2021-03-04        3       3
4 2021-03-05        1       3
5 2021-03-06        2       0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.