Pandas: drop_duplicates not working correctly

Question

For the following series, drop_duplicates is not working correctly:

by using drop_duplicates(keep='first'), it should return 4 values:

but actually, it only returns the first 2 values:

8672.0
8670.0

What's wrong with it or any suggestions for the usage of this drop_duplicates to get the values i want? Thank you so much.

MaxU - stand with Ukraine · Accepted Answer · 2018-05-17 11:29:09Z

3

DataFrame.drop_duplicates() removes all duplictes, not only consecutive ones.

Assuming s is a Series:

In [93]: s[s.diff().ne(0)]
Out[93]:
0     8672.0
3     8670.0
9     8672.0
19    8670.0
Name: 8672.0, dtype: float64

edited May 17, 2018 at 11:29

answered May 17, 2018 at 11:25

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nathan Over a year ago

Thank you but why drop_duplicates is not behaving correctly? Or I'm using it wrongly?

Nathan Over a year ago

OMG... guess I've been using it wrong for years... Thank you so much!

MaxU - stand with Ukraine Over a year ago

@Nathan, glad i could help :)

jezrael · Accepted Answer · 2018-05-17 11:31:52Z

2

I think need first consecutive values, so solution is compare by shifted values anf filter by boolean indexing:

s1 = s[s.ne(s.shift())]
print (s1)
0     8672.0
4     8670.0
10    8672.0
20    8670.0
Name: col, dtype: float64

edited May 17, 2018 at 11:31

answered May 17, 2018 at 11:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

jezrael Over a year ago

@Nathan - You are welcome! I think both solution are nice :)

Collectives™ on Stack Overflow

Pandas: drop_duplicates not working correctly

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related