0

The two data frames are given below:

df1

     Start Date  End Date                   
 0   20110706    20110803                   
 1   20110803    20110907   

df2

     DATE       50      51      52      53      54  
  0  20110706   3.51    2.51    1.51    0.51    0   
  1  20110801   10.98   9.98    8.98    7.98    6.98    
  2  20110808   9.45    8.45    7.45    6.45    5.45    
  3  20110906   0       1       23.2    0       1.2 

Based off of df1, How can I modify df2 so columns are summed based off the range the dates fall within df1 start date(left inclusive).

Modified df2 dates being left inclusive.(with start date and end date range included in the df)

       Start Date  End Date    50      51      52      53      54
  0    20110706    20110803   14.49   12.49   10.49   8.49    6.98
  1    20110803    20110907    9.45    9.45    30.65   6.45    6.65

How can this be accomplished?

5
  • So to be clear, you want to do basically an inner join on the DATE key in df2 such that it is within the Start/End Date range? Commented Oct 14, 2015 at 20:54
  • @Tgsmith61591, Correct, and also summing the values of the dates within the range. Commented Oct 14, 2015 at 20:57
  • You've tagged this as excel are you wanting an answer specific to excel or pandas? Commented Oct 14, 2015 at 22:21
  • @EdChum, I made the correction, I removed the excel tag. Thanks Commented Oct 14, 2015 at 22:45
  • @EdChum, reformatted Date strings into YYYYMMDD numbers, thanks for the notification, maybe this will make solving process simplier Commented Oct 15, 2015 at 12:21

1 Answer 1

1

Since dates are duplicated in Start Date and End Date it's not clear what do with dates in df2 which fall exactly on the start or end; is it left-inclusive or right-inclusive. Assuming it's left-inclusive you can do

df1['Start Date'] = pd.DatetimeIndex(df1['Start Date'])
df1.set_index('Start Date', inplace=True)

df2['DATE'] = pd.to_datetime(df2.DATE)
df2.set_index('DATE', inplace=True)

sums = df2.groupby(df1.index.asof).sum()
pd.concat([df1, sums], axis=1)
Sign up to request clarification or add additional context in comments.

6 Comments

I am assuming left-inclusive thanks for the verification!
When I do this it results in NaN, or empty values for all the numbers. Maybe it is a conversion issue?
hmm.. works for me. They key step is the second last. First check that the index on both frames is a datetime index. Then check the output of map(df1.index.asof, df2.index). This is the array indicating the groups (df1.index.asof is a function which is applied to the index of df2). For each date in df2.index the output should be the latest date from df1.index before that date.
for some reason df2['DATE'] = pd.to_datetime(df2.DATE) makes the df2 index of DATE result to 1970-01-01 00:00:00.0 20110706
found the issue! replacing df2['DATE'] = pd.to_datetime(df2.DATE) with df2['DATE'] = pd.to_datetime(df["DATE"], format="%Y%m%d") resolves it! Thanks for your help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.