0

For the following series, drop_duplicates is not working correctly:

8672.0
8672.0
8672.0
8672.0
8670.0
8670.0
8670.0
8670.0
8670.0
8670.0
8672.0
8672.0
8672.0
8672.0
8672.0
8672.0
8672.0
8672.0
8672.0
8672.0
8670.0
8670.0
8670.0
8670.0
8670.0

by using drop_duplicates(keep='first'), it should return 4 values:

8672.0
8670.0
8672.0
8670.0

but actually, it only returns the first 2 values:

8672.0
8670.0

What's wrong with it or any suggestions for the usage of this drop_duplicates to get the values i want? Thank you so much.

2 Answers 2

3

DataFrame.drop_duplicates() removes all duplictes, not only consecutive ones.

Assuming s is a Series:

In [93]: s[s.diff().ne(0)]
Out[93]:
0     8672.0
3     8670.0
9     8672.0
19    8670.0
Name: 8672.0, dtype: float64
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you but why drop_duplicates is not behaving correctly? Or I'm using it wrongly?
OMG... guess I've been using it wrong for years... Thank you so much!
@Nathan, glad i could help :)
2

I think need first consecutive values, so solution is compare by shifted values anf filter by boolean indexing:

s1 = s[s.ne(s.shift())]
print (s1)
0     8672.0
4     8670.0
10    8672.0
20    8670.0
Name: col, dtype: float64

1 Comment

@Nathan - You are welcome! I think both solution are nice :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.