0

I have the following dataframe:

    id outcome
0    3      no
1    3      no
2    3      no
3    3     yes
4    3      no
5    5      no
6    5      no
7    5     yes
8    5     yes
9    6      no
10   6      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no
18   6      no
19   7      no
20   7      no
21   7     yes
22   7     yes
23   7      no
24   7      no
25   7      no
26   7      yes

It is grouped based on id and is in ascending order for date.

There are a few conditions I want to satisfy.

I want to remove a current row if the row after it has the same outcome.

If a row is 'yes', then the next row must be the FIRST 'no'.

I must also have the LAST 'yes' value.

Additionally, I also want to keep the LAST 'no' above a 'yes' (so there could be 2 'no' values above a 'yes': basically in a row of no's the first and last 'no's).

This is the desired outcome for the above dataframe:

    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
8    5     yes
10   6      no
15   6     yes
16   6      no
20   7      no
22   7     yes
23   7      no
25   7      no
26   7      yes

At the moment I have created several masks like this:

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})


m1 = df['outcome'] # mask 1 is the outcome column as a dataframe 
m2 = m1.groupby(df['id']).shift(-1)  # grouped by id and shifts values up (negative direction) by 1
m3 = m1.groupby(df['id']).shift().eq('yes')&m1.eq('no') # boolean check

df2 = df[~m1.eq(m2)|m3]
m4 = df2['outcome']
m5 = m4.groupby(df2['id']).shift()
df3 = df2[~m4.eq(m5)]

With the above, however, I do not keep the first and last 'no's above a 'yes'.

1 Answer 1

1

You are on the right way for this question

g = df.groupby('id')['outcome']
cond1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
out = df[cond1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]



    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
7    5     yes
10   6      no
11   6     yes
16   6      no
20   7      no
21   7     yes
23   7      no
25   7      no
26   7     yes
Sign up to request clarification or add additional context in comments.

1 Comment

hey, actually I re-ran this and noticed an issue, I don't seem to have the last "yes", for example row 7 of your output, should really be row 8

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.