I am new to Python and do not know how to combine groupby and duplicated functions to solve my problem. I need to de-duplicate Negative values in column Result and keep only the first-appearing Negative value and set the rest to NaN while grouping by column Year (see the table below).
here is my dataframe:
df = pd.DataFrame( { "Year" : ["2004", "2004", "2004", "2005", "2005","2005", "2005", "2003","2003", "2003", "2003"] , "Result" : ["NaN", "Negative", "Negative", "Negative", "NaN", "Negative", "NaN","Neative", "NaN", "Negative", "NaN"] } )
I used this code which doesnt work:
df['Result'] = df.groupby(['Year'])['Result'].duplicated()
The original table looks like this:
| Year | Result |
|---|---|
| 2004 | NaN |
| 2004 | Negative |
| 2004 | Negative |
| 2005 | Negative |
| 2005 | NaN |
| 2005 | Negative |
| 2005 | NaN |
| 2003 | Negative |
| 2003 | NaN |
| 2003 | Negative |
| 2003 | NaN |
But I want to de-duplicate 'Negative' values in the 'Result' column, grouped by 'Year', and update the 'Result' column, so it looks like below:
| Year | Result |
|---|---|
| 2004 | NaN |
| 2004 | Negative |
| 2004 | NaN |
| 2005 | Negative |
| 2005 | NaN |
| 2005 | NaN |
| 2005 | NaN |
| 2003 | Negative |
| 2003 | NaN |
| 2003 | NaN |
| 2003 | NaN |