3

I have 2 Dataframes, I need to return the Value from df_A["Cycle"] in df_B, if the df_B["Date"] fall in between range of df_B["From_Date"] & df_B["To_Date"]


df_A:                  df_B:

Date         Cycle     From_Date    To_Date 
07.02.2021    C01     07.02.2021  13.02.2021 
08.02.2021    C01     14.02.2021  27.02.2021
14.02.2021    C02     28.02.2021  03.03.2021 
15.06.2021    C02      
28.02.2021    C03      

Desired Output:

Df B:

From_Date    To_Date    Cycle
07.02.2021  13.02.2021   C01
14.02.2021  27.02.2021   C02
28.02.2021  03.03.2021   C03 

So far i tried using np.dot but it return an shape - Value Error. I found this piece of code online

s1=Promo_Data["Date From"].values
s2=Promo_Data["Date to"].values
s=Cycle_Mapping["Date"].values[:,None]
Promo_Data["Cyc"]=np.dot((s>=s1)&(s<=s2),Cycle_Mapping["Cycle"])
1
  • df_A all fall in df_B; could you kindly explain your logic better (08.02.2021 faills between 07.02.2012 and 13.02.2021, yet it is excluded) Commented Jul 9, 2021 at 0:14

1 Answer 1

1

df1:

        Date Cycle
0 2021-02-07   C01
1 2021-02-08   C01
2 2021-02-14   C02
3 2021-06-15   C02
4 2021-02-28   C03

df2:

   From_Date    To_Date
0 2021-02-07 2021-02-13
1 2021-02-14 2021-02-27
2 2021-02-28 2021-03-03

First, let's make sure that dates are of datetime type:

df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')

Construct IntervalIndex for df2:

>>> df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'],closed='both')
>>> df2

                          From_Date    To_Date
[2021-02-07, 2021-02-13] 2021-02-07 2021-02-13
[2021-02-14, 2021-02-27] 2021-02-14 2021-02-27
[2021-02-28, 2021-03-03] 2021-02-28 2021-03-03

Define function to map Date in df1 to the range of dates in df2, and compute new column in df1 to store this range:

def get_date(d):
    try:
        return df2.loc[d].name
    except KeyError:
        pass

df1['index'] = df1['Date'].apply(get_date)

output:

        Date Cycle                     index
0 2021-02-07   C01  [2021-02-07, 2021-02-13]
1 2021-02-08   C01  [2021-02-07, 2021-02-13]
2 2021-02-14   C02  [2021-02-14, 2021-02-27]
3 2021-06-15   C02                       NaN
4 2021-02-28   C03  [2021-02-28, 2021-03-03]

Merge the two dataframes on "index" and filter the columns:

df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]

   From_Date    To_Date Cycle
0 2021-02-07 2021-02-13   C01
1 2021-02-07 2021-02-13   C01
2 2021-02-14 2021-02-27   C02
3 2021-02-28 2021-03-03   C03

If you really want to merge only on the first df1 value for each range you can groupby and keep the first, assuming the merge is now df3:

df3.groupby(['From_Date', 'To_Date'], as_index=False).first()

output:

   From_Date    To_Date Cycle
0 2021-02-07 2021-02-13   C01
1 2021-02-14 2021-02-27   C02
2 2021-02-28 2021-03-03   C03

Full code:

df1 = pd.DataFrame({'Date': ['02.07.2021', '08.02.2021', '14.02.2021', '15.06.2021', '28.02.2021'],
                    'Cycle': ['C01', 'C01', 'C02', 'C02', 'C03']})
df2 = pd.DataFrame({'From_Date': ['07.02.2021', '14.02.2021', '28.02.2021'],
                    'To_Date': ['13.02.2021', '27.02.2021', '03.03.2021']})

df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')

df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'], closed='both')

def get_date(d):
    try:
        return df2.loc[d].name
    except KeyError:
        pass

df1['index'] = df1['Date'].apply(get_date)

df3 = df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]

df3.groupby(['From_Date', 'To_Date'], as_index=False).first()
Sign up to request clarification or add additional context in comments.

2 Comments

exactly what i needed. but i am getting an "name" error("DataFrame" object does not has no attribute "name") in line return df2.loc[d].name
I had forgotten to copy one of the lines in the full code. Can you try to run it all at once?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.