2

I have Dataframe 1:

Hotel   DateFrom    DateTo      Room
BBB     2019-10-29  2020-03-27  DHS
BBB     2020-03-28  2020-10-30  DHS
BBB     2020-10-31  2021-03-29  DHS
BBB     2021-03-30  2099-01-01  DHS

And Dataframe 2:

Hotel   DateFrom    DateTo      Room    Food
BBB     2020-03-01  2020-04-24  DHS     A
BBB     2020-04-25  2020-05-03  DHS     B
BBB     2020-05-04  2020-05-31  DHS     C
BBB     2020-06-01  2020-06-22  DHS     D
BBB     2020-06-23  2020-08-26  DHS     E
BBB     2020-08-27  2020-11-30  DHS     F

I need to check if each row in df1 and if df1_DateFrom is between df2_DateFrom and df2_DateTo. Then i need to get that food code from df2 to new column in df1 or as new df3 shown below.

The result would look like this:

df3:

    Hotel   DateFrom    DateTo      Room  Food
    BBB     2019-10-29  2020-03-27  DHS   
    BBB     2020-03-28  2020-10-30  DHS   A
    BBB     2020-10-31  2021-03-29  DHS   F 
    BBB     2021-03-30  2099-01-01  DHS

I would really appreciate any help with this. I am kinda new on Pandas and still learning and i must say that it is bit complicated for me.

1
  • Welcome to stackoverflow! You are possibly trying to achieve something similar to what was asked in this question stackoverflow.com/questions/31328014/… Commented May 26, 2020 at 15:07

2 Answers 2

3

You can do a cross merge and query:

# recommend dealing with datetime type:
df1['DateFrom'],df1['DateTo'] = pd.to_datetime(df1['DateFrom']),pd.to_datetime(df1['DateTo'])
df2['DateFrom'],df2['DateTo'] = pd.to_datetime(df2['DateFrom']),pd.to_datetime(df2['DateTo'])

new_df = (df1.reset_index().merge(df2, on=['Hotel','Room'],
                                  how='left', suffixes=['','_'])
             .query('DateFrom_ <= DateFrom <= DateTo_')
         )
df1['Food'] = new_df.set_index('index')['Food']

Output:

  Hotel   DateFrom     DateTo Room Food
0   BBB 2019-10-29 2020-03-27  DHS  NaN
1   BBB 2020-03-28 2020-10-30  DHS    A
2   BBB 2020-10-31 2021-03-29  DHS    F
3   BBB 2021-03-30 2099-01-01  DHS  NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Thank You, worked perfectly - now i just need to figure out what exactly happens there :)
So I was reading about merging and I think i understand it. But I have one more question. If i merge two dataframes: (df1 has columns [hotel, room, guestName]) and (df2 has columns [hotel, room, dateFrom, dateTo]). When I make this: RESULT = df1.merge(df2, on=['hotel', 'room']) - in result I will get frame that has columns: [hotel, room, guestName, dateFrom, dateTo]. Is there a way to get in result matching rows from both dataframes but columns only from df1 (so that RESULT is [hotel, room, guestName]). I am asking because I would like to skip another filtering after merge.
0

Much less elegant then Quang Hoang's answer, but the solution using np.piecewise would look like this. See also https://stackoverflow.com/a/30630905/4873972

import pandas as pd
import numpy as np
from io import StringIO

# Creating the dataframes.
df1 = pd.read_table(StringIO("""
Hotel   DateFrom    DateTo      Room
BBB     2019-10-29  2020-03-27  DHS
BBB     2020-03-28  2020-10-30  DHS
BBB     2020-10-31  2021-03-29  DHS
BBB     2021-03-30  2099-01-01  DHS
"""), sep=r"\s+").convert_dtypes()

df1["DateFrom"] = pd.to_datetime(df1["DateFrom"])
df1["DateTo"] = pd.to_datetime(df1["DateTo"])

df2 = pd.read_table(StringIO("""
Hotel   DateFrom    DateTo      Room    Food
BBB     2020-03-01  2020-04-24  DHS     A
BBB     2020-04-25  2020-05-03  DHS     B
BBB     2020-05-04  2020-05-31  DHS     C
BBB     2020-06-01  2020-06-22  DHS     D
BBB     2020-06-23  2020-08-26  DHS     E
BBB     2020-08-27  2020-11-30  DHS     F
"""), sep=r"\s+").convert_dtypes()

df2["DateFrom"] = pd.to_datetime(df2["DateFrom"])
df2["DateTo"] = pd.to_datetime(df2["DateTo"])
# Avoid zero index for merging later on.
df2["id"] = np.arange(1, len(df2) +1 )

# Find matching indexes.
df1["df2_id"] = np.piecewise(
    np.zeros(len(df1)), 
    [(df1["DateFrom"].values >= start_date) & (df1["DateFrom"].values <= end_date) for start_date, end_date in zip(df2["DateFrom"].values, df2["DateTo"].values)], 
    df2.index.values
)

# Merge on matching indexes.
df1.merge(df2["Food"], left_on="df2_id", right_index=True, how="left")

Output:

  Hotel   DateFrom     DateTo Room Food
0   BBB 2019-10-29 2020-03-27  DHS  NaN
1   BBB 2020-03-28 2020-10-30  DHS    A
2   BBB 2020-10-31 2021-03-29  DHS    F
3   BBB 2021-03-30 2099-01-01  DHS  NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.