How to apply multiple conditions to drop/select specific rows from a dataframe (pandas)?

Question

I have the following dataframe:

    id outcome
0    3      no
1    3      no
2    3      no
3    3     yes
4    3      no
5    5      no
6    5      no
7    5     yes
8    5     yes
9    6      no
10   6      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no
18   6      no
19   7      no
20   7      no
21   7     yes
22   7     yes
23   7      no
24   7      no
25   7      no
26   7      yes

It is grouped based on id and is in ascending order for date.

There are a few conditions I want to satisfy.

I want to remove a current row if the row after it has the same outcome.

If a row is 'yes', then the next row must be the FIRST 'no'.

I must also have the LAST 'yes' value.

Additionally, I also want to keep the LAST 'no' above a 'yes' (so there could be 2 'no' values above a 'yes': basically in a row of no's the first and last 'no's).

This is the desired outcome for the above dataframe:

    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
8    5     yes
10   6      no
15   6     yes
16   6      no
20   7      no
22   7     yes
23   7      no
25   7      no
26   7      yes

At the moment I have created several masks like this:

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})


m1 = df['outcome'] # mask 1 is the outcome column as a dataframe 
m2 = m1.groupby(df['id']).shift(-1)  # grouped by id and shifts values up (negative direction) by 1
m3 = m1.groupby(df['id']).shift().eq('yes')&m1.eq('no') # boolean check

df2 = df[~m1.eq(m2)|m3]
m4 = df2['outcome']
m5 = m4.groupby(df2['id']).shift()
df3 = df2[~m4.eq(m5)]

With the above, however, I do not keep the first and last 'no's above a 'yes'.

BENY · Accepted Answer · 2021-11-12 01:35:17Z

1

You are on the right way for this question

g = df.groupby('id')['outcome']
cond1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
out = df[cond1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]



    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
7    5     yes
10   6      no
11   6     yes
16   6      no
20   7      no
21   7     yes
23   7      no
25   7      no
26   7     yes

answered Nov 12, 2021 at 1:35

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ze0ruso Over a year ago

hey, actually I re-ran this and noticed an issue, I don't seem to have the last "yes", for example row 7 of your output, should really be row 8

Collectives™ on Stack Overflow

How to apply multiple conditions to drop/select specific rows from a dataframe (pandas)?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related