2

I have a dataframe that I want to remove duplicate values that are consecutive if their values are 'true' or 'false'. I know how to remove duplicate consecutive rows but not sure how to remove only values that have only values of 'true' or 'false' and not remove all the consecutive duplicate values.

cols =['col_b']
df = df.loc[(df[cols].shift() != df[cols]).any(axis=1)]

For example:

col_a   col_b
21     'true'
25      'true'
76      'abc'
89      'ttt'
99      'ttt'
210     'false'
211     'false'
212     'false'

And I need the following result:

col_a   col_b
21     'true'
76      'abc'
89      'ttt'
99      'ttt'
210     'false'

but it removes 'ttt' values which I need them.

2 Answers 2

3

Let us try use shift with cumsum create the group, then do duplicated + the condition of false

s1 = df.col_b.ne(df.col_b.shift()).cumsum().duplicated()
s2 = df.col_b.isin(["'true'","'false'"])
df=df[~(s1&s2)]

df
   col_a    col_b
0     21   'true'
2     76    'abc'
3     89    'ttt'
4     99    'ttt'
5    210  'false'
Sign up to request clarification or add additional context in comments.

1 Comment

it should remove both duplicates of true and false not just false value
1

All you need is adding an additional filter for [true, false].

>>> df["before"] = df["col_b"].shift()
>>> df
   col_a  col_b before
0     21   true    NaN
1     25   true   true
2     76    abc   true
3     89    ttt    abc
4     99    ttt    ttt
5    210  false    ttt
6    211  false  false
7    212  false  false
>>> df[~((df["col_b"] == df["before"]) & (df["before"].isin(["true", "false"])))].drop(["before"], axis="columns")
   col_a  col_b
0     21   true
2     76    abc
3     89    ttt
4     99    ttt
5    210  false

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.