How to sum specific columns based on dates

Question

The two data frames are given below:

df1

     Start Date  End Date                   
 0   20110706    20110803                   
 1   20110803    20110907

df2

     DATE       50      51      52      53      54  
  0  20110706   3.51    2.51    1.51    0.51    0   
  1  20110801   10.98   9.98    8.98    7.98    6.98    
  2  20110808   9.45    8.45    7.45    6.45    5.45    
  3  20110906   0       1       23.2    0       1.2

Based off of df1, How can I modify df2 so columns are summed based off the range the dates fall within df1 start date(left inclusive).

Modified df2 dates being left inclusive.(with start date and end date range included in the df)

       Start Date  End Date    50      51      52      53      54
  0    20110706    20110803   14.49   12.49   10.49   8.49    6.98
  1    20110803    20110907    9.45    9.45    30.65   6.45    6.65

How can this be accomplished?

So to be clear, you want to do basically an inner join on the DATE key in df2 such that it is within the Start/End Date range? — TayTay
– TayTay, Commented Oct 14, 2015 at 20:54
@Tgsmith61591, Correct, and also summing the values of the dates within the range. — Techno04335
– Techno04335, Commented Oct 14, 2015 at 20:57
You've tagged this as excel are you wanting an answer specific to excel or pandas? — EdChum
– EdChum, Commented Oct 14, 2015 at 22:21
@EdChum, I made the correction, I removed the excel tag. Thanks — Techno04335
– Techno04335, Commented Oct 14, 2015 at 22:45
@EdChum, reformatted Date strings into YYYYMMDD numbers, thanks for the notification, maybe this will make solving process simplier — Techno04335
– Techno04335, Commented Oct 15, 2015 at 12:21

JoeCondron · Accepted Answer · 2015-10-16 07:59:39Z

1

Since dates are duplicated in Start Date and End Date it's not clear what do with dates in df2 which fall exactly on the start or end; is it left-inclusive or right-inclusive. Assuming it's left-inclusive you can do

df1['Start Date'] = pd.DatetimeIndex(df1['Start Date'])
df1.set_index('Start Date', inplace=True)

df2['DATE'] = pd.to_datetime(df2.DATE)
df2.set_index('DATE', inplace=True)

sums = df2.groupby(df1.index.asof).sum()
pd.concat([df1, sums], axis=1)

edited Oct 16, 2015 at 7:59

answered Oct 14, 2015 at 21:36

JoeCondron

8,9163 gold badges29 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Techno04335 Over a year ago

I am assuming left-inclusive thanks for the verification!

Techno04335 Over a year ago

When I do this it results in NaN, or empty values for all the numbers. Maybe it is a conversion issue?

JoeCondron Over a year ago

hmm.. works for me. They key step is the second last. First check that the index on both frames is a datetime index. Then check the output of map(df1.index.asof, df2.index). This is the array indicating the groups (df1.index.asof is a function which is applied to the index of df2). For each date in df2.index the output should be the latest date from df1.index before that date.

Techno04335 Over a year ago

for some reason df2['DATE'] = pd.to_datetime(df2.DATE) makes the df2 index of DATE result to 1970-01-01 00:00:00.0 20110706

Techno04335 Over a year ago

found the issue! replacing df2['DATE'] = pd.to_datetime(df2.DATE) with df2['DATE'] = pd.to_datetime(df["DATE"], format="%Y%m%d") resolves it! Thanks for your help!

|

Collectives™ on Stack Overflow

How to sum specific columns based on dates

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related