How to delete rows with duplicate values in succeeding rows

Question

in my csv data I have a column with a following data:

I read it in data frame and I'd like to delete on of the rows with the duplicating numbers but only if they are immedietely one after another. I marked the rows I's like to remove with an *. Thanks for any suggestions

DSM · Accepted Answer · 2013-04-08 14:47:57Z

2

I think you can do this using .shift(), which can shift a series forward or backward (defaulting to one forward.) You want to keep rows if they're not the same as the next ones, so something like:

 df[df["A"] != df["A"].shift()]

For example:

>>> df = pd.DataFrame({"A": [1,2,1,2,2,3,3,3,1,2]})
>>> df["A"]
0    1
1    2
2    1
3    2
4    2
5    3
6    3
7    3
8    1
9    2
Name: A, dtype: int64
>>> df["A"].shift()
0   NaN
1     1
2     2
3     1
4     2
5     2
6     3
7     3
8     3
9     1
Name: A, dtype: float64
>>> df["A"] != df["A"].shift()
0     True
1     True
2     True
3     True
4    False
5     True
6    False
7    False
8     True
9     True
Name: A, dtype: bool

Leading up to:

>>> df[df["A"] != df["A"].shift()]
   A
0  1
1  2
2  1
3  2
5  3
8  1
9  2

answered Apr 8, 2013 at 14:47

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to delete rows with duplicate values in succeeding rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related