I have the following dataframe:
| . | id | outcome |
|---|---|---|
| 0 | 3 | no |
| 1 | 3 | no |
| 2 | 3 | no |
| 3 | 3 | yes |
| 4 | 3 | no |
| 5 | 5 | no |
| 6 | 5 | no |
| 7 | 5 | yes |
| 8 | 5 | yes |
| 9 | 6 | no |
| 10 | 6 | no |
| 11 | 6 | yes |
| 12 | 6 | yes |
| 13 | 6 | yes |
| 14 | 6 | yes |
| 15 | 6 | yes |
| 16 | 6 | no |
| 17 | 6 | no |
I would like to delete all rows of 'yes' if they are the last 'yes' in the outcome column.
I would also like to drop all 'no' if they are the first values in the dataframe
These must be grouped by the 'id' column
This should be the output:
| . | id | outcome |
|---|---|---|
| 3 | 3 | yes |
| 4 | 3 | no |
| 11 | 6 | yes |
| 12 | 6 | yes |
| 13 | 6 | yes |
| 14 | 6 | yes |
| 15 | 6 | yes |
| 16 | 6 | no |
| 17 | 6 | no |
At the moment I have tried this:
df = pd.DataFrame(data={
'id': [3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6],
'outcome': ['no', 'no', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no']
})
m1 = df.groupby(['id'])['outcome'].head() != 'yes'
df = df.drop(m1[m1].index)
m2 = df.groupby(['id'])['outcome'].tail() != 'no'
df = df.drop(m2[m2].index)
print(df)
If I put a 1 in head() and tail() , this just removes the last value and not the preceding values. Any suggestions?