1

I have a dataframe like the following

df
     Name  Y
0     A    1
1     A    0
2     B    0
3     B    0
5     C    1

I want to drop the duplicates of Name and keep the ones that have Y=1 such as:

df
     Name  Y
0     A    1
1     B    0
2     C    1

3 Answers 3

2

Use drop_duplicates method,

df.sort_values('Y', ascending= False).drop_duplicates(subset=['Name'])
Sign up to request clarification or add additional context in comments.

2 Comments

drop_duplicates has by default keep ='first' , so your proposition will keep 0's instead of 1's. You should either sort in descending ordrer , or add a keep='last' argument in drop duplicates
Agree, will etit
2

groupby + max

Assuming your Y series consists only of 0 and 1 values:

res = df.groupby('Name', as_index=False)['Y'].max()

print(res)

  Name  Y
0    A  1
1    B  0
2    C  1

Comments

1

Does 'Y' column contain only 0-1? In that case, you can try the following :

df = df.sort_values(['Y'], ascending= False)
df = df.drop_duplicates(['Name'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.