2

I have the following dataframe:

. id outcome
0 3 no
1 3 no
2 3 no
3 3 yes
4 3 no
5 5 no
6 5 no
7 5 yes
8 5 yes
9 6 no
10 6 no
11 6 yes
12 6 yes
13 6 yes
14 6 yes
15 6 yes
16 6 no
17 6 no

I would like to delete all rows of 'yes' if they are the last 'yes' in the outcome column.

I would also like to drop all 'no' if they are the first values in the dataframe

These must be grouped by the 'id' column
This should be the output:

. id outcome
3 3 yes
4 3 no
11 6 yes
12 6 yes
13 6 yes
14 6 yes
15 6 yes
16 6 no
17 6 no

At the moment I have tried this:

df = pd.DataFrame(data={
       'id': [3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6], 
       'outcome': ['no', 'no', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no']
     })

m1 = df.groupby(['id'])['outcome'].head() != 'yes'
df = df.drop(m1[m1].index)
m2 = df.groupby(['id'])['outcome'].tail() != 'no'
df = df.drop(m2[m2].index)

print(df)

If I put a 1 in head() and tail() , this just removes the last value and not the preceding values. Any suggestions?

1 Answer 1

1

You need to compute masks and slice. In summary, I computed here the rank of each stretch of yes/no to determine if they are initial (= rank 1) of final (=max rank per group).

o = df['outcome']
g = df.groupby('id')['outcome']
m1 = o.ne(g.shift()).groupby(df['id']).cumsum()
m2 = m1.groupby(df['id']).transform('max')
df[~((m1.eq(1)&o.eq('no'))|(m1.eq(m2)&o.eq('yes')))]

Output:

    id outcome
3    3     yes
4    3      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no

NB. The final mask used in slicing could be simplified using boolean arithmetics, but I left it as is for clarity on the conditions

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thanks for the details, so I re-ran this and It doesn't keep all the following 'no' values after the 'yes' outcomes. For example, there could be more than one 'no' after the 'yes'.
Can you provide an updated example?
yep, just added an extra row to the end of the table
@TSRAI I'm confused, when I run my code, this give the expected output (I updated the output based on your new df)
apologies, I checked on an updated df!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.