4

I am trying to sum the values of colA, over a date range based on "date" column, and store this rolling value in the new column "sum_col" But I am getting the sum of all rows (=100), not just those in the date range.

I can't use rolling or groupby by as my dates (in the real data) are not sequential (some days are missing)

Amy idea how to do this? Thanks.

# Create data frame
df = pd.DataFrame()

# Create datetimes and data
df['date'] = pd.date_range('1/1/2018', periods=100, freq='D')
df['colA']= 1
df['colB']= 2
df['colC']= 3


StartDate = df.date-  pd.to_timedelta(5, unit='D') 
EndDate= df.date

dfx=df
dfx['StartDate'] = StartDate
dfx['EndDate'] = EndDate

dfx['sum_col']=df[(df['date'] > StartDate) & (df['date'] <= EndDate)].sum()['colA']
dfx.head(50)
1
  • 1
    Can you post an example of the current output and your expected output? Commented Jul 21, 2018 at 11:05

2 Answers 2

5

I'm not sure whether you want 3 columns for the sum of colA, colB, colC respectively, or one column which sums all three, but here is an example of how you would sum the values for colA:

dfx['colAsum'] = dfx.apply(lambda x: df.loc[(df.date >= x.StartDate) & 
                                            (df.date <= x.EndDate), 'colA'].sum(), axis=1)

e.g. (withperiods=10):

        date  colA  colB  colC  StartDate    EndDate  colAsum
0 2018-01-01     1     2     3 2017-12-27 2018-01-01        1
1 2018-01-02     1     2     3 2017-12-28 2018-01-02        2
2 2018-01-03     1     2     3 2017-12-29 2018-01-03        3
3 2018-01-04     1     2     3 2017-12-30 2018-01-04        4
4 2018-01-05     1     2     3 2017-12-31 2018-01-05        5
5 2018-01-06     1     2     3 2018-01-01 2018-01-06        6
6 2018-01-07     1     2     3 2018-01-02 2018-01-07        6
7 2018-01-08     1     2     3 2018-01-03 2018-01-08        6
8 2018-01-09     1     2     3 2018-01-04 2018-01-09        6
9 2018-01-10     1     2     3 2018-01-05 2018-01-10        6
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks much, just what I needed!
1

If what I understand is correct:

for i in range(df.shape[0]):

    dfx.loc[i,'sum_col']=df[(df['date'] > StartDate[i]) & (df['date'] <= EndDate[i])].sum()['colA']

For example, in range (2018-01-01, 2018-01-06) the sum is 6.

date    colA    colB    colC    StartDate   EndDate sum_col
0   2018-01-01  1   2   3   2017-12-27  2018-01-01  1.0
1   2018-01-02  1   2   3   2017-12-28  2018-01-02  2.0
2   2018-01-03  1   2   3   2017-12-29  2018-01-03  3.0
3   2018-01-04  1   2   3   2017-12-30  2018-01-04  4.0
4   2018-01-05  1   2   3   2017-12-31  2018-01-05  5.0
5   2018-01-06  1   2   3   2018-01-01  2018-01-06  5.0
6   2018-01-07  1   2   3   2018-01-02  2018-01-07  5.0 
7   2018-01-08  1   2   3   2018-01-03  2018-01-08  5.0
8   2018-01-09  1   2   3   2018-01-04  2018-01-09  5.0
9   2018-01-10  1   2   3   2018-01-05  2018-01-10  5.0
10  2018-01-11  1   2   3   2018-01-06  2018-01-11  5.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.