How to filter out duplicates based on various filters

Question

I have a dataframe with the columns Letters, Numbers, and Digits

df = pd.DataFrame({'Letters':['AB', 'XY', 'ZW','ZW','XY' ],
               'Numbers': [1234, 4, 333, 333, 4],
               'Digits': [32234, 32534, 4234, 4235, NaN]})

print(df)
  Letters  Numbers    Digits
0      AB     1234    32234   
1      XY        4    32534   
2      ZW      333    4234   
3      ZW      333    4235  
4      XY        4    NaN

I would like to filter out duplicates based on specific columns (here Letters and Numbers) and delte the rows where the column has a specific value (in this case where "Digits" is the greatest or Nan)

So the result would be

print(df)
  Letters  Numbers    Digits
0      AB     1234    32234   
1      XY        4    32534      
3      ZW      333    4235

cs95 · Accepted Answer · 2019-04-15 19:47:33Z

1

We can make use of sort_values with na_position argument, then call drop_duplicates:

(df.sort_values('Digits', na_position='first')
   .drop_duplicates(['Letters', 'Numbers'], keep='last')
   .sort_index())

  Letters  Numbers   Digits
0      AB     1234  32234.0
1      XY        4  32534.0
3      ZW      333   4235.0

answered Apr 15, 2019 at 19:47

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to filter out duplicates based on various filters

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related