Sum values in one dataframe based on date range in a second dataframe

Question

I have two dataframes (simplified examples below). One contains a series of dates and values (df1), the second contains a date range (df2). I would like to identify/select/mask the date range from df2 in df1, sum the associated df1 values and add them to a new column in df2.

I'm a novice and all the techniques I have tried have been unsuccessful--a combination of wrong method, combining incompatible methods, syntax errors and so on. I have searched the Q&As here, but none have quite addressed this issue.

import pandas as pd
#********** df1: dates and values ***********
rng = pd.date_range('2012-02-24', periods=12, freq='D')
df1 = pd.DataFrame({ 'STATCON': ['C00028', 'C00489', 'C00038', 'C00589', 'C10028', 'C00499', 'C00238', 'C00729',
                            'C10044', 'C00299', 'C00288', 'C00771'],
'Date': rng,
'Val': [0.96, 0.57, 0.39, 0.17, 0.93, 0.86, 0.54, 0.58, 0.43, 0.19, 0.40, 0.32]
                 })

#********** df2: date range ***********
df2 = pd.DataFrame({
                'BCON': ['B002', 'B004', 'B005'],
                'Start': ['2012-02-25', '2012-02-28', '2012-03-01'],
                'End': ['2012-02-29', '2012-03-04', '2012-03-06']
               })
df2[['Start','End']] = df2[['Start','End']].apply(pd.to_datetime)

#********** Desired Output:  df2 -- date range with summed values ***********
df3 = pd.DataFrame({
                'BCON': ['B002', 'B004', 'B005'],
                'Start': ['2012-02-25', '2012-02-28', '2012-03-01'],
                'End': ['2012-02-29', '2012-03-04', '2012-03-06'],
                'Sum_Val': [2.92, 3.53, 2.46]
               })

edouard · Accepted Answer · 2022-03-30 00:40:56Z

2

You can solve this with the Dataframe.apply function as follow:

def to_value(row):
  return df1[(row['Start'] <= df1['Date']) & (df1['Date'] <= row['End'])]['Val'].sum()

df3 = df2.copy()
df3['Sum_Val'] = df3.apply(to_value, axis=1)

The to_value function is called on every row of the df3 dataframe.

See here for a live implementation of the solution: https://1000words-hq.com/n/TcYN1Fz6Izp

edited Mar 30, 2022 at 0:40

answered Mar 29, 2022 at 21:09

edouard

1213 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

91TB Over a year ago

I had been trying various forms of your return statement before I asked for help in the forum. In my iterations I thought I would need to reference df2, as in df2['Start'] <= etc., but you do it with (row) in the function. How come df2 doesn't have to be explicitly stated?

edouard Over a year ago

During the d3.apply call, the to_value function is called sequentially with every row of d3. And since d3 is initialized with d2.copy(), it's essentially the rows in df2. You can test how the to_value function works by calling: to_value(df2.iloc[0,:])

sammywemmy · Accepted Answer · 2023-06-10 04:27:31Z

0

One option is with conditional_join from pyjanitor - it uses a binary search, which is usually better than a naive sequential scan (which can be memory consuming, depending on the data size):

# pip install pyjanitor
import pandas as pd
import numpy as np

df1['Date'] = pd.to_datetime(df1.Date)

(df1
.conditional_join(
    df2, 
    ('Date', 'Start', '>='), 
    ('Date', 'End', '<='), 
    how = 'right')
.groupby(df2.columns.tolist(), sort = False)
.Val
.sum()
)
BCON  Start       End       
B002  2012-02-25  2012-02-29    2.92
B004  2012-02-28  2012-03-04    3.53
B005  2012-03-01  2012-03-06    2.46
Name: Val, dtype: float64

edited Jun 10, 2023 at 4:27

answered Mar 29, 2022 at 22:36

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Sum values in one dataframe based on date range in a second dataframe

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related