I have the following dataframe:
id outcome
0 3 no
1 3 no
2 3 no
3 3 yes
4 3 no
5 5 no
6 5 no
7 5 yes
8 5 no
9 5 yes
10 6 no
11 6 no
12 6 yes
13 6 no
14 6 no
I want to remove the no outcomes at the start of a sequence before a yes, and keep all other no outcomes, so the output dataframe looks like this:
id outcome
3 3 yes
4 3 no
7 5 yes
8 5 no
9 5 yes
12 6 yes
13 6 no
14 6 no
At the moment I have tried this:
df = pd.DataFrame(data={
'id': [3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 6, 6],
'outcome': ['no', 'no', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'no', 'no']
})
df = df[df.groupby('id').outcome.transform(lambda x: x.ne('no'))]
However, this simply removes all no outcomes.
I know I then need to take the index of these rows and remove them from the dataframe. Any suggestions?