1

I have a pandas DataFrame with a city name and a date as follows :

In[34]: df.head(6)
Out[34]: 
       CITY        DATE
0    LONDON  2017-03-12
1    LONDON  2017-03-12
2     PARIS  2014-05-05
3     PARIS  2017-03-12
4    LONDON  2017-03-12
5  NEW-YORK  2017-03-12

I also have another DataFrame matching a person to city for a given time range (it basically says this person was in this city between START date and END date)

In[51]: db.head()
Out[51]: 
       CITY PERSON       START         END
0     PARIS    ID4  2014-01-01  2017-03-16
1  NEW-YORK    ID5  2014-01-07  2016-12-31
2    LONDON    ID1  2014-01-01  2016-05-08
3  MONTREAL    ID1  2016-05-09  2017-03-16
4     TOKYO    ID5  2017-01-01  2017-03-16

I would like to to add a column to df to determine for each row which was the person in the given city for the given date .

I was able to achieve it using a custom function myfunc that I apply row-wise to df using df.apply(lambda x: myfunc(x['CITY'], x['DATE']), axis=1).

myfunc simply identifies in db the correct PERSON as follows:

def myfunc(city, date):
    return db.loc[(db.CITY==city) & (db.START <= date) & (db.END >= date), 'PERSON'].values[0]

This works well but it is rather slow for very large dataframes... I was trying to somehow merge the db data into df or at least to implement a vectorized version of what I did without relying on a row-wise implementation. Any help?

1 Answer 1

4

Use pd.merge_asof

  • df must be sorted by 'DATE'
  • db must be sorted by 'START' then by 'END'
  • we use the by parameter to only match up by 'CITY'
  • query at the end to make sure we only get 'END' >= 'DATE'

pd.merge_asof(
    df.sort_values('DATE'),
    db.sort_values(['START', 'END']),
    left_on='DATE', right_on='START', by='CITY'
).query('DATE <= END')

    CITY       DATE PERSON      START        END
0  PARIS 2014-05-05    ID4 2014-01-01 2017-03-16
3  PARIS 2017-03-12    ID4 2014-01-01 2017-03-16

Notice that only the 'PARIS' entries have matched with the data you provided.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.