Perform merge for specific duplicate rows in pandas DataFrame

Question

Let's be the following two DataFrames in python:

df:

code_1	other
19001	white
19009	blue
19008	red

df_1:

code_1	code_2
19001	00001
19001	00002
19009	00003
19008	00001

I want to merge df with df_1:

    df_merge = pd.merge(df, df_1, how="left", on=['code_1'])

df_merge:

code_1	other	code_2
19001	white	00001
19001	white	00002
19009	blue	00003
19008	red	00004

I want the merge to remove duplicates in the case of code_1 and only do the merge for the first row. I could do a drop_duplicates for [other, code_1], but I would like to know if it is possible to include some parameter in the merge function to do it directly.

Expected result:

code_1	other	code_2
19001	white	00001
19009	blue	00003
19008	red	00004

HedgeHog · Accepted Answer · 2022-11-04 10:47:16Z

1

In my opinion there is no specifc parameter for pandas.merge() that fit your needs, but you could reduce the result by dropping duplicates before merging, assumed there are only duplicates in df_1:

df_merge = df.merge(df_1.drop_duplicates('code_1'), how="left", on=['code_1'])

answered Nov 4, 2022 at 10:47

HedgeHog

25.4k5 gold badges18 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Perform merge for specific duplicate rows in pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related