Pandas DataFrame - delete rows that have same value at a particular column as a previous row

Question

I have a pandas dataframe, I want to check for each row if it has the same value at a particular column(let's call it porduct_type), and if it does, delete it. In other words, out of a group of consecutive rows with the same value at a particular column, I want to keep only one.

Example, if column A is the one on which we don't want consecutive duplicates:

See related: stackoverflow.com/questions/19463985/…

EdChum
– EdChum

2014-07-25 07:11:43 +00:00
Commented Jul 25, 2014 at 7:11 — EdChum
– EdChum, Commented Jul 25, 2014 at 7:11

DSM · Accepted Answer · 2014-07-24 21:52:15Z

5

It's a little tricky, but you could do something like

>>> df.groupby((df["A"] != df["A"].shift()).cumsum().values).first()
   A   B    C
1  0   1    1
2  2   1   10
3  0  11  100
4  5   2  200

answered Jul 24, 2014 at 21:52

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Baron Yugovich Over a year ago

How about this df = df[df['A'] != df.shift(-1)['A']]

furas Over a year ago

@BaronYugovich I would rather do df = df[df['A'] != df['A'].shift(-1)] - first ['A'] then shift(-1) to shift only one column not all df.

Pan Over a year ago

Does this solution only remove one consecutive duplicate? What if there are more than two consecutive rows with the same value in A?

Collectives™ on Stack Overflow

Pandas DataFrame - delete rows that have same value at a particular column as a previous row

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related