2

Consider two dataframes:

>>> X = pd.DataFrame(np.arange(0,12).reshape(4,3),columns=['a','b','c'])
>>> X
   a   b   c
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
>>> 
>>> Y = pd.DataFrame(np.array([['abc',22],['fgh',44],['ijk',0],['xee',99],['RGD',3]]),columns = ['x','y'])
>>> Y
     x   y
0  abc  22
1  fgh  44
2  ijk   0
3  xee  99
4  RGD   3

I want to join these two dataframes in a way such that I get the result

   a   b   c
0  ijk 1  2
1  RGD 4   5
2  6   7   8
3  9  10  11

I have tried the following:

    >>> X.loc[X['a'].astype(str).isin(Y['y']),'a']=Y[Y['y'].astype(str).isin(X['a'])]
>>> X
     a   b   c
0  nan   1   2
1  nan   4   5
2 6.00   7   8
3 9.00  10  11

I think it is trying to match them index by index, giving me a nan. I have tried joining X and Y also but can't get that to work. I think merging the two dataframes would work but I don't know how to merge them on column 'a' and 'y' appropriately

Any tips here would be greatly appreciated

1 Answer 1

4

You can use map to replace a in X with x in Y if it exists otherwise keep the original values:

X['a'] = X.a.astype(str).map(Y.set_index('y').x).fillna(X.a)

enter image description here


Another option with merge, (I corrected the data type in Y i.e. assume if the y column is float instead of string):

X = pd.DataFrame(np.arange(0,12).reshape(4,3),columns=['a','b','c'])
Y = pd.DataFrame([['abc',22],['fgh',44],['ijk',0],['xee',99],['RGD',3]],columns = ['x','y'])

Then a merge on x and a columns gives:

mX = X.merge(Y.set_index("y"), left_on="a", right_index=True, how="left")
mX

enter image description here

Then depends on your need, you can combine the columns a and x together or leave them as is which I think is actually more reasonable:

To combine column a and x, you can just do:

mX.assign(a = mX.x.fillna(mX.a)).drop('x', axis=1)

This gives the same result as the first option.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks - seems to work well. Do you have any idea how to this with with .merge()?
Brilliant thanks again. Could I ask why using set_index("y") is necessary for both of the solutions?
It's just a trick to save the effort to drop it later on, as you will have duplicated merge_on(key) columns in the result.
Ah OK that makes sense - and for the first solution it seems completely necessary? It seems as though it "matches" based off the index
Right. For the map method it's necessary, as map looks for the index value pairs for matching.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.