57

I have a dataframe df:

id   name   count
1    a       10
2    b       20
3    c       30
4    d       40
5    e       50

Here I have another dataframe df2:

id1  price   rating
 1     100     1.0
 2     200     2.0
 3     300     3.0
 5     500     5.0

I want to join these two dataframes on column id and id1. Here is an example of df3:

id   name   count   price   rating
1    a       10      100      1.0
2    b       20      200      2.0
3    c       30      300      3.0
4    d       40      Nan      Nan
5    e       50      500      5.0

Should I use df.merge or pd.concat?

2
  • @piRSquared Sir, both answers are absolutely correct :) Can't pick both, just one question, suppose i am dealing with two dataframes each of around 4 million rows. i want the fastest way, in between join,merge and map Which one should be the most optimized way Commented Jan 4, 2017 at 12:21
  • 2
    Both are essentially the same under the hood. And I don' t care which you pick. @jezrael and I are always on SO. We'll get our rep one way or another. I'm more interested in making sure those 15 rep don't go to waste. Pick his because he was a few micro seconds faster than me on this one ;-) Commented Jan 4, 2017 at 12:23

2 Answers 2

96

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
         .drop(['id1', 'rating'], axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
         .drop('id1', axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0
Sign up to request clarification or add additional context in comments.

11 Comments

keeping this answer, if from df2 i have to select only 1 column 'price' then?
I dont understand, can you explain more?
final table has id name count id1 price rating but i want only price from df2 not rating, then?
the 2 ways that you suggested both are correct,right?
Yes, all solutions are correct. If need add more column, nicer and better is join (not necessary delete column, left join by default), but if need add only one column map is faster.
|
12

join utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left' dataframe.

Strategy:

  • set_index on df2 to be id1
  • use join with df as the left dataframe and id as the on parameter. Note that I could have set_index('id') on df to avoid having to use the on parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.

df.join(df2.set_index('id1'), on='id')

   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If you only want price from df2

df.join(df2.set_index('id1')[['price']], on='id')


   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

1 Comment

keeping this answer, if from df2 i have to select only 1 column 'price' then?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.