1

I am trying to overlap timestamp from df2 with df1 time window. When ever there is no match I am getting the following error. how can I get the output with out the following error?

Error

KeyError: "[Timestamp('2022-01-01 03:12:02')] not in index"

input

from datetime import datetime, date
import pandas as pd

df1 = pd.DataFrame({'id': ['aa0', 'aa1', 'aa2', 'aa3'],
                   'number': [1, 2, 2, 1],
                   'color': ['blue', 'red', 'yellow', "green"],
                   'date1': [datetime(2022,1,1,1,1,1), 
                                  datetime(2022,1,1,2,4,1), 
                                  datetime(2022,1,1,3,8,1), 
                                  datetime(2022,1,1,4,12,1)],
                   'date2': [datetime(2022,1,1,2,1,1), 
                                datetime(2022,1,1,3,6,1), 
                                datetime(2022,1,1,3,10,1), 
                                datetime(2022,1,1,4,14,1)] })

input2

df2 = pd.DataFrame({'id': ['A', 'B', 'C', 'D'],
                   'value': [10,20,30,40],
                   'date': [datetime(2022,1,1,1,12,1), 
                                  datetime(2022,1,1,1,40,1), 
                                  datetime(2022,1,1,3,12,2), 
                                  datetime(2022,1,1,4,12,2)] })

Expected output

(2022-01-01 01:01:01, 2022-01-01 02:01:01]    15.0
(2022-01-01 04:12:01, 2022-01-01 04:14:01]    40.0

Code

idx = pd.IntervalIndex.from_arrays(pd.to_datetime(df1['date1']),
                                   pd.to_datetime(df1['date2']))
mapper = pd.Series(idx, index=idx)

df2.groupby(mapper[pd.to_datetime(df2['date'])].values)['value'].mean()

2 Answers 2

1

One option is with conditional_join from pyjanitor, which solves inequality joins such as this:

# pip install pyjanitor
import pandas as pd
import janitor

df1['date1'] = pd.to_datetime(df1['date1'])
df1['date2'] = pd.to_datetime(df1['date2'])
df2['date'] = pd.to_datetime(df2['date'])

(
df1
.filter(like='date')
.conditional_join(
    df2.filter(['value', 'date']), 
    ('date1', 'date', '<='), 
    ('date2', 'date', '>='))
.groupby(['date1', 'date2'])
.value
.mean()
)

date1                date2              
2022-01-01 01:01:01  2022-01-01 02:01:01    15.0
2022-01-01 04:12:01  2022-01-01 04:14:01    40.0
Name: value, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

0

I think I figured it out. It is not the best but works.

df1['date'] = pd.to_datetime(df1['date1']).dt.date
df2['date'] = pd.to_datetime(df2['dates']).dt.date
df3 = pd.merge(df1, df2, on=['date'], how='left')
mask = (df3['dates'] > df3['date1']) & (df3['dates'] < df3['date2'])
df4 = df3.loc[mask]
df4.groupby(['date1', 'date2'])['value'].mean()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.