1

I have a dataframe with the columns Letters, Numbers, and Digits

df = pd.DataFrame({'Letters':['AB', 'XY', 'ZW','ZW','XY' ],
               'Numbers': [1234, 4, 333, 333, 4],
               'Digits': [32234, 32534, 4234, 4235, NaN]})

print(df)
  Letters  Numbers    Digits
0      AB     1234    32234   
1      XY        4    32534   
2      ZW      333    4234   
3      ZW      333    4235  
4      XY        4    NaN    

I would like to filter out duplicates based on specific columns (here Letters and Numbers) and delte the rows where the column has a specific value (in this case where "Digits" is the greatest or Nan)

So the result would be

print(df)
  Letters  Numbers    Digits
0      AB     1234    32234   
1      XY        4    32534      
3      ZW      333    4235

1 Answer 1

1

We can make use of sort_values with na_position argument, then call drop_duplicates:

(df.sort_values('Digits', na_position='first')
   .drop_duplicates(['Letters', 'Numbers'], keep='last')
   .sort_index())

  Letters  Numbers   Digits
0      AB     1234  32234.0
1      XY        4  32534.0
3      ZW      333   4235.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.