1

I am new to Python and do not know how to combine groupby and duplicated functions to solve my problem. I need to de-duplicate Negative values in column Result and keep only the first-appearing Negative value and set the rest to NaN while grouping by column Year (see the table below).

here is my dataframe:

df = pd.DataFrame( { "Year" : ["2004", "2004", "2004", "2005", "2005","2005", "2005", "2003","2003", "2003", "2003"] , "Result" : ["NaN", "Negative", "Negative", "Negative", "NaN", "Negative", "NaN","Neative", "NaN", "Negative", "NaN"] } )

I used this code which doesnt work:

df['Result'] = df.groupby(['Year'])['Result'].duplicated()

The original table looks like this:

Year Result
2004 NaN
2004 Negative
2004 Negative
2005 Negative
2005 NaN
2005 Negative
2005 NaN
2003 Negative
2003 NaN
2003 Negative
2003 NaN

But I want to de-duplicate 'Negative' values in the 'Result' column, grouped by 'Year', and update the 'Result' column, so it looks like below:

Year Result
2004 NaN
2004 Negative
2004 NaN
2005 Negative
2005 NaN
2005 NaN
2005 NaN
2003 Negative
2003 NaN
2003 NaN
2003 NaN

1 Answer 1

2

Use DataFrame.duplicated with Series.mask:

df['Result'] = df['Result'].mask(df.duplicated(['Year','Result']))
print (df)
    Year    Result
0   2004       NaN
1   2004  Negative
2   2004       NaN
3   2005  Negative
4   2005       NaN
5   2005       NaN
6   2005       NaN
7   2003  Negative
8   2003       NaN
9   2003       NaN
10  2003       NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.