Python Pandas get matching data from different dataframe

Question

I have 2 dataframes.

First one, lets call it requests and df1.

id  number  val
0   09876   1
1   12345   2
2   23456   3
3   34567   4

and then I have another dataframe, lets call it receipts and df2.

id  item    ref     receipt
0   shoes   34567   #Pos32
1   socks   12345   #Pos33

Requests will be my main dataframe that I will be working with and adding data.

I need to add a new columns in requests, based on some data from receipts.

If df2 contains a ref which equals the ‘number’ from df1 I want to create a new column in df1 called receipt with the allocated receipt number.

I tried the following using Numpy,

df1[‘receipt'] = np.where(df1[’number'] == df2[‘ref'], df2[‘receipt'], ‘')

but I’m greeted with ValueError: Can only compare identically-labeled Series objects, which makes sense cause the dataframes will not have the same order.

Any other suggestions how to get past this?

at the end I'd like my dataframe to look something like this

id      number      val    receipt
0       09876       1      
1       12345       2      #Pos2
2       23456       3      
3       34567       4      #Pos1

Thanks

Look at using map if join on one column and returning one column, otherwise you want to merge your two dataframes. — Scott Boston
– Scott Boston, Commented Nov 24, 2021 at 13:01

Scott Boston · Accepted Answer · 2021-11-24 13:05:30Z

1

Try map:

df1['receipt'] = df1['number'].map(df2.set_index('ref')['receipt'])
print(df1)

Output:

   id  number  val receipt
0   0    9876    1     NaN
1   1   12345    2  #Pos33
2   2   23456    3     NaN
3   3   34567    4  #Pos32

Otherwise use merge:

df1.merge(df2, left_on='number', right_on='ref', how='left')

Output:

   id_x  number  val  id_y   item      ref receipt
0     0    9876    1   NaN    NaN      NaN     NaN
1     1   12345    2   1.0  socks  12345.0  #Pos33
2     2   23456    3   NaN    NaN      NaN     NaN
3     3   34567    4   0.0  shoes  34567.0  #Pos32

answered Nov 24, 2021 at 13:05

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ronald Langeveld Over a year ago

Thanks! I gave it a try, but getting pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects with the map method

Scott Boston Over a year ago

@RonaldLangeveld Yes, if df2 has multiple rows with 'ref' then you need to do a drop_duplicates or figure out which ref you need to use. df.drop_duplicates('ref').set_index('ref')['receipt'] inside map.

Ronald Langeveld Over a year ago

Thanks, Scott. Weirdly it doesn’t seem to be having any effect.

Scott Boston Over a year ago

@RonaldLangeveld Typo in the above.. should be df2.drop_duplicates.....

Collectives™ on Stack Overflow

Python Pandas get matching data from different dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related