I have a DataFrame that has duplicated rows. I'd like to get a DataFrame with a unique index and no duplicates. It's ok to discard the duplicated values. Is this possible? Would it be a done by groupby?
2 Answers
In [29]: df.drop_duplicates()
Out[29]:
b c
1 2 3
3 4 0
7 5 9
4 Comments
ely
It's worthwhile to note this takes either the first or last occurrence. So you need to sort by some other quantity first (if you're lucky) or do some complicated groupby logic anyway.
safetyduck
This is wrong. drop_duplicates acts on the values only (at least in my version). You need to reset_index if you want to drop on index and values or just work with the index if you want to have a unique index. Maybe there is another way besides groupby to enforce unique index?
Flavian Hautbois
Use
df.drop_duplicates(inplace=True) if you don't want to assign a new variable.dashesy
this does not give a dataframe with unique index, the solution by @Adam Greenhall below, however works for that
Figured out one way to do it by reading the split-apply-combine documentation examples.
df = pandas.DataFrame({'b':[2,2,4,5], 'c': [3,3,0,9]}, index=[1,1,3,7])
df_unique = df.groupby(level=0).first()
df
b c
1 2 3
1 2 3
3 4 0
7 5 9
df_unique
b c
1 2 3
3 4 0
7 5 9
3 Comments
hobs
This relies on the row index being duplicated for rows where the data fields (b,c) are duplicated, effectively making the index part of your row as vector that you want to be unique (not duplicated).
rogueleaderr
If you have duplicated index entries, this is the answer you want.
dashesy
I was getting
ValueError: Index contains duplicate entries, cannot reshape when doing unstack on a MultIndex but this solution works for that only I had to do df_unique = df.groupby(level=[0,1]).first()