I am working with some financial tick data. Given two example dataframes like this:
left_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.323597 100 101
2022-01-02 00:00:01.828502 100 101
2022-01-02 00:00:01.845020 100 101
2022-01-02 00:00:03.123567 100 101
right_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.110223 500 501
2022-01-02 00:00:01.800000 500 501
2022-01-02 00:00:03.100000 500 501
If I 'merge' left to right, I want the merged dataframe to look like this:
Time_left Time_right Bid Price_left Ask Price_left Bid Price_right Ask Price_right
2022-01-02 00:00:01.323597 2022-01-02 00:00:01.110223 100 101 500 501
2022-01-02 00:00:01 828502 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:01.845020 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:03.123567 2022-01-02 00:00:03.100000 100 101 500 501
i.e. for each time_left x, get the most recent time_right y up to x, and y can be equal to x.
Whereas if I wanted to 'merge' right to left, the resulting dataframe should look like this:
Time_right Time_left Bid Price_right Ask Price_right Bid Price_left Ask Price_left
2022-01-02 00:00:01.800000 2022-01-02 00:00:01.323597 500 501 100 101
2022-01-02 00:00:03.100000 2022-01-02 00:00:01.845020 500 501 100 101
What would be the most efficient way of doing this on a dataset that could potentially have tens of millions of rows?