I am looking to map multiple variable conditions between two data-frames. I have a solution which works quite nicely, but I am sure there is a more efficient method of achieving my goal. I have a dataframe containing a column of employees df1['SN'] with shift dates df1['shift_date']. I have another set of data which describes the contract df2['con_type'] type the employee was on across a date range df2[['con_start_date', 'con_end_date']]. What i want to do, is to map the contract type the employee was on, on their shift date.
df1:
SN shift_date
0 ID1 2020-01-02
1 ID1 2020-01-03
2 ID1 2020-01-06
3 ID1 2020-01-20
4 ID1 2020-01-21
5 ID2 2020-01-03
6 ID2 2020-01-04
df2:
SN con_start_date con_end_date con_type
0 ID1 2013-12-31 2020-01-07 FT
1 ID1 2020-01-08 2020-12-31 PT
2 ID2 2019-12-04 2020-12-31 FT
with the outcome df3:
SN shift_date con_type
0 ID1 2020-01-02 FT
1 ID1 2020-01-03 FT
2 ID1 2020-01-06 FT
3 ID1 2020-01-20 PT
4 ID1 2020-01-21 PT
5 ID2 2020-01-03 FT
6 ID2 2020-01-04 FT
current solution which works nicely:
for index,rows in df2.iterrows():
df3=df1.copy()
filter1=(df1['SN']==rows['SN'])
filter2=(df1['Date']>=rows['con_start_date'])
filter3=(df1['Date']<rows['con_end_date'])
mask=filter1 & filter2 & filter3
df1.loc[mask,'con_type']=rows['con_type']
However, while I have a solution that works, I am convinced there is a better way to do it? Iterrows is notoriously in-efficient compared to other methods :(. Also, if there is a better title, please let me know!