How to delete sequence of rows with same value with a condition?

Question

I have the following dataframe:

.	id	outcome
0	3	no
1	3	no
2	3	no
3	3	yes
4	3	no
5	5	no
6	5	no
7	5	yes
8	5	yes
9	6	no
10	6	no
11	6	yes
12	6	yes
13	6	yes
14	6	yes
15	6	yes
16	6	no
17	6	no

I would like to delete all rows of 'yes' if they are the last 'yes' in the outcome column.

I would also like to drop all 'no' if they are the first values in the dataframe

These must be grouped by the 'id' column
This should be the output:

.	id	outcome
3	3	yes
4	3	no
11	6	yes
12	6	yes
13	6	yes
14	6	yes
15	6	yes
16	6	no
17	6	no

At the moment I have tried this:

df = pd.DataFrame(data={
       'id': [3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6], 
       'outcome': ['no', 'no', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no']
     })

m1 = df.groupby(['id'])['outcome'].head() != 'yes'
df = df.drop(m1[m1].index)
m2 = df.groupby(['id'])['outcome'].tail() != 'no'
df = df.drop(m2[m2].index)

print(df)

If I put a 1 in head() and tail() , this just removes the last value and not the preceding values. Any suggestions?

mozway · Accepted Answer · 2021-11-14 17:52:21Z

1

You need to compute masks and slice. In summary, I computed here the rank of each stretch of yes/no to determine if they are initial (= rank 1) of final (=max rank per group).

o = df['outcome']
g = df.groupby('id')['outcome']
m1 = o.ne(g.shift()).groupby(df['id']).cumsum()
m2 = m1.groupby(df['id']).transform('max')
df[~((m1.eq(1)&o.eq('no'))|(m1.eq(m2)&o.eq('yes')))]

Output:

    id outcome
3    3     yes
4    3      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no

NB. The final mask used in slicing could be simplified using boolean arithmetics, but I left it as is for clarity on the conditions

edited Nov 14, 2021 at 17:52

answered Nov 14, 2021 at 16:33

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ze0ruso Over a year ago

Hi, thanks for the details, so I re-ran this and It doesn't keep all the following 'no' values after the 'yes' outcomes. For example, there could be more than one 'no' after the 'yes'.

mozway Over a year ago

Can you provide an updated example?

Ze0ruso Over a year ago

yep, just added an extra row to the end of the table

mozway Over a year ago

@TSRAI I'm confused, when I run my code, this give the expected output (I updated the output based on your new df)

Ze0ruso Over a year ago

apologies, I checked on an updated df!

Collectives™ on Stack Overflow

How to delete sequence of rows with same value with a condition?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

.	id	outcome
0	3	no
1	3	no
2	3	no
3	3	yes
4	3	no
5	5	no
6	5	no
7	5	yes
8	5	yes
9	6	no
10	6	no
11	6	yes
12	6	yes
13	6	yes
14	6	yes
15	6	yes
16	6	no
17	6	no

.	id	outcome
0	3	no
1	3	no
2	3	no
3	3	yes
4	3	no
5	5	no
6	5	no
7	5	yes
8	5	yes
9	6	no
10	6	no
11	6	yes
12	6	yes
13	6	yes
14	6	yes
15	6	yes
16	6	no
17	6	no

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related

.	id	outcome
0	3	no
1	3	no
2	3	no
3	3	yes
4	3	no
5	5	no
6	5	no
7	5	yes
8	5	yes
9	6	no
10	6	no
11	6	yes
12	6	yes
13	6	yes
14	6	yes
15	6	yes
16	6	no
17	6	no