2

I have two pandas dataframes matches with columns (match_id, team_id,date, ...) and teams_att with columns (id, team_id, date, overall_rating, ...). I want to join the two dataframes on matches.team_id = teams_att.team_id and teams_att.date closest to matches.date

Example

matches

match_id    team_id     date
1           101         2012-05-17
2           101         2014-07-11
3           102         2010-05-21
4           102         2017-10-24

teams_att

id  team_id     date        overall_rating
1   101         2010-02-22  67
2   101         2011-02-22  69
3   101         2012-02-20  73
4   101         2013-09-17  79
5   101         2014-09-10  74
6   101         2015-08-30  82
7   102         2015-03-21  42
8   102         2016-03-22  44

Desired results

match_id    team_id     matches.date    teams_att.date      overall_rating
1           101         2012-05-17      2012-02-20          73
2           101         2014-07-11      2014-09-10          74
3           102         2010-05-21      2015-03-21          42
4           102         2017-10-24      2016-03-22          44

2 Answers 2

2

You can use merge_asof with by and direction parameters:

pd.merge_asof(matches.sort_values('date'), 
              teams_att.sort_values('date'), 
              on='date', by='team_id', 
              direction='nearest')

Output:

   match_id  team_id       date  id  overall_rating
0         3      102 2010-05-21   7              42
1         1      101 2012-05-17   3              73
2         2      101 2014-07-11   5              74
3         4      102 2017-10-24   8              44
Sign up to request clarification or add additional context in comments.

4 Comments

This is cool , I do not know , we have by here.. learn it , thanks man :-)
@Wen Well, I have returned one of many favors you've done for me.
@ScottBoston what if the two dataframes have different names for the key column [the one given to 'by']
@IsmailKarchi There are parameters left_on and right_on and left_by and right_by. pd.merge_asof
1

We using merge_asof (Please check Scott's answer, that is the right way for solving this type problem :-) cheers )

g1=df1.groupby('team_id')
g=df.groupby('team_id')

l=[]
for x in [101,102]:
   l.append(pd.merge_asof(g.get_group(x),g1.get_group(x),on='date',direction ='nearest'))


pd.concat(l)
Out[405]: 
   match_id  team_id_x       date  id  team_id_y  overall_rating
0         1        101 2012-05-17   3        101              73
1         2        101 2014-07-11   5        101              74
0         3        102 2010-05-21   7        102              42
1         4        102 2017-10-24   8        102              44

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.