0

I have 2 dataframes derived from csv files df1

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  

df2

 |sId  |BID    |StartDateTime      |EndDateTime        
0|10007|1002867|2019-07-26 05:11:05|2019-10-05 21:50:55
1|10006|1002147|2019-08-18 05:11:05|2019-10-05 21:50:55
2|10006|1002147|2019-10-05 21:50:55|2019-11-06 21:50:28
3|10006|1002147|2019-10-06 21:50:28|2019-10-08 03:56:20
4|10006|1002147|2019-10-08 03:56:20|2019-10-09 03:50:35
5|10006|1002147|2019-10-09 03:50:35|2019-10-10 05:12:30
6|10006|1002147|2019-10-10 05:12:30|2019-10-11 05:12:38
7|10009|1002348|2019-09-26 04:21:12|2019-10-06 04:16:00
8|10009|1002348|2019-10-06 04:16:00|2019-10-07 04:11:38
9|10009|1002348|2019-10-07 04:11:38|2019-10-08 04:13:12

Note that both dataframes are not of same length

I want to add the column sId, StartDateTime and EndDateTime from df2 to df1 only if the following conditions match:

if df1.BID = df2.BID and df1.DateTime is between df2.StartDateTime and df2.EndDatetime

My result should look like this:

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId|sId  |StartDateTime      |EndDateTime        
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  |10006|2019-08-18 05:11:05|2019-10-05 21:50:55
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  |10006|2019-10-05 21:50:55|2019-11-06 21:50:28
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  |NA   |NA                 |NA                 

I have tried using the method from this post: Create column based on multiple column conditions from another dataframe

however I get only the Site Id in my result and not the StartDateTime and EndDateTime How can i get these columns in my result

Tried code:

for key, grp in df2.groupby('sId'):
    cols = ['BID', 'StartDateTime', 'EndDateTime']
    masks = (df1['BID'].eq(bid) & df1['Datetime'].between(start, end) for bid, start, end in grp[cols].itertuples(index=False))
    df1.loc[pd.concat(masks, axis=1).any(1), 'sId'] = key

df1['sId'] = df1['sId'].fillna('NA')
print(df1)

This prints out only:

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId|sId  
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  |10007
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  |10007
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  |10007
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 |10007
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  |10006
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  |10006
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  |NA   

1 Answer 1

1

Assuming that 'sId' in df2 is always filled with value, than code below provides exactly the desired result:

df3 = pd.merge(df1, df2, on='BID', how="left")
result = df3[df3['Datetime'].between(df3.StartDateTime, df3.EndDateTime) | df3.sId.isna()]
Sign up to request clarification or add additional context in comments.

1 Comment

When I tried this method with large files I get Memory Error . is there a way to overcome this in the same method?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.