0

I have 2 dataframes.

First one, lets call it requests and df1.

id  number  val
0   09876   1
1   12345   2
2   23456   3
3   34567   4

and then I have another dataframe, lets call it receipts and df2.

id  item    ref     receipt
0   shoes   34567   #Pos32
1   socks   12345   #Pos33

Requests will be my main dataframe that I will be working with and adding data.

I need to add a new columns in requests, based on some data from receipts.

If df2 contains a ref which equals the ‘number’ from df1 I want to create a new column in df1 called receipt with the allocated receipt number.

I tried the following using Numpy,

df1[‘receipt'] = np.where(df1[’number'] == df2[‘ref'], df2[‘receipt'], ‘')

but I’m greeted with ValueError: Can only compare identically-labeled Series objects, which makes sense cause the dataframes will not have the same order.

Any other suggestions how to get past this?

at the end I'd like my dataframe to look something like this

id      number      val    receipt
0       09876       1      
1       12345       2      #Pos2
2       23456       3      
3       34567       4      #Pos1

Thanks

1
  • Look at using map if join on one column and returning one column, otherwise you want to merge your two dataframes. Commented Nov 24, 2021 at 13:01

1 Answer 1

1

Try map:

df1['receipt'] = df1['number'].map(df2.set_index('ref')['receipt'])
print(df1)

Output:

   id  number  val receipt
0   0    9876    1     NaN
1   1   12345    2  #Pos33
2   2   23456    3     NaN
3   3   34567    4  #Pos32

Otherwise use merge:

df1.merge(df2, left_on='number', right_on='ref', how='left')

Output:

   id_x  number  val  id_y   item      ref receipt
0     0    9876    1   NaN    NaN      NaN     NaN
1     1   12345    2   1.0  socks  12345.0  #Pos33
2     2   23456    3   NaN    NaN      NaN     NaN
3     3   34567    4   0.0  shoes  34567.0  #Pos32
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! I gave it a try, but getting pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects with the map method
@RonaldLangeveld Yes, if df2 has multiple rows with 'ref' then you need to do a drop_duplicates or figure out which ref you need to use. df.drop_duplicates('ref').set_index('ref')['receipt'] inside map.
Thanks, Scott. Weirdly it doesn’t seem to be having any effect.
@RonaldLangeveld Typo in the above.. should be df2.drop_duplicates.....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.