1

I have a dataset that has a no_employees column that is a str object. whats the best way to create a new column (company_size) in the data frame and fill it with values based on the no_employees column like in the example below

mental_health_df = pd.read_csv("Mental Health.csv")
pd.set_option('display.max_columns', None)
mental_health_df.head(100)

no_employees        company_size
                 |
6-25             |Small
More than 1000   |Extremely Large
500-1000         |Very Large
26-100           |Medium
100-500          |Large
1-5              |Very Small
0

1 Answer 1

3

Please bin using df.cut

 import numpy as np
df['company_size']=pd.cut(df['no_employees']. astype('category').cat.codes*10,[-np.inf,9,19,29,39,49,np.inf], labels=['Very Small','Large','Medium','Very Large','Small','Extremely Large'])
print(df)

    no_employees     company_size
0            6-25            Small
1  More than 1000  Extremely Large
2        500-1000       Very Large
3          26-100           Medium
4         100-500            Large
5             1-5       Very Small

How it works

#Converted no of employees to codes but for ease of defining bins multiplied by ten
  df['no_employees']. astype('category').cat.codes*10

#Decided to bin using df.cut
pd.cut(df['no_employees']. astype('category').cat.codes*10,\
       [-np.inf,9,19,29,39,49,np.inf], labels=['Very Small','Large','Medium','Very Large','Small','Extremely Large'])
Sign up to request clarification or add additional context in comments.

2 Comments

Could you explain what the [-np.inf,9,19,29,39,49,np.inf] part is doing please?
This defines the bins, categories within which the provided info should be devided.. As per documentation it Defines the number of equal-width bins in the range of x, np.inf==positive infinity and -np.inf== negative infinity

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.