2

I am trying to find the highest values of a column in my dataframe. However, as the values contain % they are strings, not integers, which is preventing me from using nlargest. I would like to know if I can convert the strings to integers.

Here is an example of my code:

import pandas as pd
import re
test_data = {
            'Animal': ['Otter', 'Turtle', 'Chicken'],
            'Squeak Appeal': [12.8, 1.92, 11.4],
            'Richochet Chance': ['8%', '30%', '16%'],
            }        
test_df = pd.DataFrame(
                        test_data, 
                        columns=[ 'Animal', 'Squeak Appeal','Richochet Chance']
                        )

My attempts to use nlargest:

r_chance = test_df.nlargest(2, ['Richochet Chance'])
# TypeError: Column 'Richochet Chance' has dtype object, cannot use method 'nlargest' with this dtype
r_chance = test_df.nlargest(2, re.sub("[^0-9]", ""(['Richochet Chance'])))
# TypeError: 'str' object is not callable

If there is no sensible way to do this I shan't remain in denial. I just wondered if I could avoid looping through a large df and converting strings to integers for multiple columns.

1 Answer 1

2

Let's convert that column into floats and extract the top indexes:

idx = (test_df['Richochet Chance']
          .str.strip('%')          # remove the ending %
          .astype(float)           # convert to float 
          .nlargest(2).index       # nlargest and index
      )
test_df.loc[idx]

Output:

    Animal  Squeak Appeal Richochet Chance
1   Turtle           1.92              30%
2  Chicken          11.40              16%
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.