Remove duplicate rows from Pandas dataframe where only some columns have the same value

Question

I have a pandas dataframe as follows:

I want that only 1 row remains of rows that share the same values in specific columns. In the example above I mean columns A and B. In other words, if the values of columns A and B occur more than once in the dataframe, only one row should remain (which one does not matter).

FWIW: the maximum number of so called duplicate rows (that is, where column A and B are the same) is 2.

The result should looke like this:

or

jezrael · Accepted Answer · 2017-06-11 08:21:34Z

26

Use drop_duplicates with parameter subset, for keeping only last duplicated rows add keep='last':

df1 = df.drop_duplicates(subset=['A','B'])
#same as
#df1 = df.drop_duplicates(subset=['A','B'], keep='first')
print (df1)
   A  B  C
0  1  2  x
2  3  4  z
3  3  5  x

df2 = df.drop_duplicates(subset=['A','B'], keep='last')
print (df2)
   A  B  C
1  1  2  y
2  3  4  z
3  3  5  x

answered Jun 11, 2017 at 8:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Poka Over a year ago

@ jezrael If I want to remove duplicates just from a column without removing rows. Let's say I have 10 rows which is corrowponds to same time instant . So I want to write to txt file but only one time instant I want to print for all 10 rows instead of showing same time for each rows.

jezrael Over a year ago

@Poka - If dont want remove rows, only solution is replace duplicated values to NaN or empty string. Something like this solution

Collectives™ on Stack Overflow

Remove duplicate rows from Pandas dataframe where only some columns have the same value

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related