Pandas: unique dataframe

Question

I have a DataFrame that has duplicated rows. I'd like to get a DataFrame with a unique index and no duplicates. It's ok to discard the duplicated values. Is this possible? Would it be a done by groupby?

Wouter Overmeire · Accepted Answer · 2012-09-07 18:37:39Z

86

In [29]: df.drop_duplicates()
Out[29]: 
   b  c
1  2  3
3  4  0
7  5  9

answered Sep 7, 2012 at 18:37

Wouter Overmeire

69.7k10 gold badges67 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ely Over a year ago

It's worthwhile to note this takes either the first or last occurrence. So you need to sort by some other quantity first (if you're lucky) or do some complicated groupby logic anyway.

safetyduck Over a year ago

This is wrong. drop_duplicates acts on the values only (at least in my version). You need to reset_index if you want to drop on index and values or just work with the index if you want to have a unique index. Maybe there is another way besides groupby to enforce unique index?

Flavian Hautbois Over a year ago

Use df.drop_duplicates(inplace=True) if you don't want to assign a new variable.

dashesy Over a year ago

this does not give a dataframe with unique index, the solution by @Adam Greenhall below, however works for that

Adam Greenhall · Accepted Answer · 2012-09-07 20:17:54Z

11

Figured out one way to do it by reading the split-apply-combine documentation examples.

df = pandas.DataFrame({'b':[2,2,4,5], 'c': [3,3,0,9]}, index=[1,1,3,7])
df_unique = df.groupby(level=0).first()

df
   b  c
1  2  3
1  2  3
3  4  0
7  5  9

df_unique
   b  c
1  2  3
3  4  0
7  5  9

edited Sep 7, 2012 at 20:17

answered Sep 7, 2012 at 17:38

Adam Greenhall

5,0886 gold badges33 silver badges31 bronze badges

3 Comments

hobs Over a year ago

This relies on the row index being duplicated for rows where the data fields (b,c) are duplicated, effectively making the index part of your row as vector that you want to be unique (not duplicated).

rogueleaderr Over a year ago

If you have duplicated index entries, this is the answer you want.

dashesy Over a year ago

I was getting ValueError: Index contains duplicate entries, cannot reshape when doing unstack on a MultIndex but this solution works for that only I had to do df_unique = df.groupby(level=[0,1]).first()

Collectives™ on Stack Overflow

Pandas: unique dataframe

2 Answers 2

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related