2

I am working with some financial tick data. Given two example dataframes like this:

left_df =

Time                          Bid Price    Ask Price
2022-01-02 00:00:01.323597    100          101
2022-01-02 00:00:01.828502    100          101
2022-01-02 00:00:01.845020    100          101
2022-01-02 00:00:03.123567    100          101

right_df =

Time                          Bid Price    Ask Price
2022-01-02 00:00:01.110223    500          501
2022-01-02 00:00:01.800000    500          501
2022-01-02 00:00:03.100000    500          501

If I 'merge' left to right, I want the merged dataframe to look like this:

Time_left                     Time_right                    Bid Price_left    Ask Price_left    Bid Price_right    Ask Price_right
2022-01-02 00:00:01.323597    2022-01-02 00:00:01.110223    100               101               500                501
2022-01-02 00:00:01 828502    2022-01-02 00:00:01.800000    100               101               500                501                               
2022-01-02 00:00:01.845020    2022-01-02 00:00:01.800000    100               101               500                501
2022-01-02 00:00:03.123567    2022-01-02 00:00:03.100000    100               101               500                501

i.e. for each time_left x, get the most recent time_right y up to x, and y can be equal to x.

Whereas if I wanted to 'merge' right to left, the resulting dataframe should look like this:

Time_right                     Time_left                    Bid Price_right    Ask Price_right    Bid Price_left    Ask Price_left
2022-01-02 00:00:01.800000    2022-01-02 00:00:01.323597    500               501               100                101
2022-01-02 00:00:03.100000    2022-01-02 00:00:01.845020    500               501               100                101

What would be the most efficient way of doing this on a dataset that could potentially have tens of millions of rows?

1 Answer 1

2

Try this

# convert to datetime
left_df['Time'] = pd.to_datetime(left_df['Time'])
right_df['Time'] = pd.to_datetime(right_df['Time'])
# insert time_right column
right_df.insert(1, 'Time_right', right_df['Time'])
# merge_asof
df = pd.merge_asof(left_df, right_df, on='Time', suffixes=('_left','_right'))
print(df)
                        Time  Bid_Price_left  Ask_Price_left                   Time_right  Bid_Price_right  Ask_Price_right  
0 2022-01-02 00:00:01.323597             100             101   2022-01-02 00:00:01.110223              500              501  
1 2022-01-02 00:00:01.828502             100             101   2022-01-02 00:00:01.800000              500              501  
2 2022-01-02 00:00:01.845020             100             101   2022-01-02 00:00:01.800000              500              501  
3 2022-01-02 00:00:03.123567             100             101   2022-01-02 00:00:03.100000              500              501  

Source

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.