drop_duplicates in pandas when duplicate is only in first column

Question

I have a dataframe with two columns. The first column, say A, has duplicates, the second does not.

I have tried

df["A"].drop_duplicates(inplace=True)

but that returns the same number of rows. How can I drop the rows where the value in column "A" is the same?

Example:

John Miller
John Smith
Mark Robinson
Jeffrey Robinson

should return

John Miller
Mark Robinson
Jeffrey Robinson

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

Use drop_duplicates with parameter subset:

df.drop_duplicates(subset=['A'],inplace=True)
print (df)
         A         B
0     John    Miller
2     Mark  Robinson
3  Jeffrey  Robinson

Docs:

subset : column label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 6, 2017 at 17:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user Over a year ago

Great, this is what I wanted.

Collectives™ on Stack Overflow

drop_duplicates in pandas when duplicate is only in first column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related