2
import pandas as pd

I have a Dataframe Table

d1 = pd.DataFrame({'ID_A':[1, 2, 3], 'name':['Micha', 'Micha', 'Lea']})

-----------------
- ID_A   - name  
-----------------
- 1      - Micha 
- 2      - Micha 
- 3      - Lea   
-----------------

I wanna do a self join to get the following

---------------------------
- ID_A_x - name  - ID_A_y -
---------------------------
- 1      - Micha - 1      -
- 1      - Micha - 2      -
- 3      - Lea   - 3      -
---------------------------

But with

pd.merge(d1, d1, left_on='name', right_on='name', how='left')

i get duplicate pairs that are the same for me, how to avoid them? This result is not what i want:

---------------------------
- ID_A_x - name  - ID_A_y -
---------------------------
- 1      - Micha - 1      -
- 1      - Micha - 2      -
- 2      - Micha - 2      -
- 2      - Micha - 1      -
- 3      - Lea   - 3      -
---------------------------

Please help.

2 Answers 2

2

I am not sure that I understood you correctly. But one possible solution (which I think is what you want) could be:

import pandas as pd
d1 = pd.DataFrame({'ID_A':[1, 2, 3], 'name':['Micha', 'Micha', 'Lea']})
pd.merge(d1.drop_duplicates(subset='name'), d1, on='name', how='left')

Output:

   ID_A_x   name  ID_A_y
0       1  Micha       1
1       1  Micha       2
2       3    Lea       3
Sign up to request clarification or add additional context in comments.

3 Comments

Works, Thank you.
I seem to be having this same issue. Dropping duplicates might work, but it concerns me that it produces them in the first place. Is there a better solution?
@ColoradoGranite The reason for the duplicates is that we have non-unique keys. We merge on 'name' and Micha is two times in that column. The only way to avoid duplicates would be to have unique keys, I think. So to answer your question, I can't find anything better with the above data.
1

Imgur

pd.DataFrame({'ID_A_x':[1,1,2,2,3],
             'name':['Mi','Mi','Mi','Mi','Lea'],
             'ID_A_y':[1,2,2,1,3]}).drop_duplicates(['ID_A_y','name'])

3 Comments

This is perfect for cleaning it up afterwards. Thank you.
hi @loegare, may i have lib name for showing "executed in ..ms, finished...."
@rean its a jupyter notebook extension, dont remember the exact name

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.