Python: how to drop duplicates with duplicates?

Question

I have a dataframe like the following

df
     Name  Y
0     A    1
1     A    0
2     B    0
3     B    0
5     C    1

I want to drop the duplicates of Name and keep the ones that have Y=1 such as:

df
     Name  Y
0     A    1
1     B    0
2     C    1

Alessandro · Accepted Answer · 2018-11-16 11:43:34Z

2

Use drop_duplicates method,

df.sort_values('Y', ascending= False).drop_duplicates(subset=['Name'])

edited Nov 16, 2018 at 11:43

answered Nov 16, 2018 at 10:56

Alessandro

87511 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Matina G Over a year ago

drop_duplicates has by default keep ='first' , so your proposition will keep 0's instead of 1's. You should either sort in descending ordrer , or add a keep='last' argument in drop duplicates

Alessandro Over a year ago

Agree, will etit

jpp · Accepted Answer · 2018-11-16 11:07:53Z

2

`groupby` + `max`

Assuming your Y series consists only of 0 and 1 values:

res = df.groupby('Name', as_index=False)['Y'].max()

print(res)

  Name  Y
0    A  1
1    B  0
2    C  1

answered Nov 16, 2018 at 11:07

jpp

166k37 gold badges301 silver badges363 bronze badges

Comments

Matina G · Accepted Answer · 2018-11-16 11:09:19Z

1

Does 'Y' column contain only 0-1? In that case, you can try the following :

df = df.sort_values(['Y'], ascending= False)
df = df.drop_duplicates(['Name'])

answered Nov 16, 2018 at 11:09

Matina G

1,6022 gold badges17 silver badges29 bronze badges

Collectives™ on Stack Overflow

Python: how to drop duplicates with duplicates?

3 Answers 3

2 Comments

`groupby` + `max`

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

groupby + max

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`groupby` + `max`