1

I have read this link: Check which columns in DataFrame are Categorical

I have a dataframe where salaries are mentioned with a $ prepended to it. It is also being shown as categorical data.

Moreover suppose my nominal data is not in form of strings such as 'F','M' etc. Then how do we classify which columns are numeric, categorical (with strings) and nominal?

Say my data looks like this:

ID    Gender   Salary   HasPet  
1      M       $250       0
2      F       $5000      0
3      M       $4500      1  
7
  • Can you add Minimal, Complete, and Verifiable example? Commented Apr 24, 2016 at 11:45
  • Thank you for adding sample. What is desired output? Commented Apr 24, 2016 at 11:55
  • @jezrael I want to know which columns are numeric,categorical(with strings) and nominal data.In the link given, they have found the numeric data. But what in case of salary, due to the $ sign, it is being shown as non numeric and hence tagged as categorical data Commented Apr 24, 2016 at 12:05
  • 1
    Suppose your DataFrame is df, how about df.dtypes or df.info()? Commented Apr 24, 2016 at 12:30
  • for salary it will give me object I guess. And for nominal data having 0-1 it will show me int64 Commented Apr 24, 2016 at 12:41

1 Answer 1

5

You are confusing categorical data type with strings (pandas shows it as object).

Numbers can't contain $ dollar sign by their nature and because of that pandas consider Salary column as string and this is correct behavior!

You can easily convert your salary column to integer/float if you want though:

In [180]: df
Out[180]:
   Gender Salary
0       F  $3283
1       M  $6958
2       F  $3721
3       F  $7732
4       M  $7198
5       F  $5475
6       F  $7410
7       M  $8673
8       F  $8582
9       M  $4115
10      F  $8658
11      F  $6331
12      M  $6174
13      F  $6261
14      M  $6212

In [181]: df.dtypes
Out[181]:
Gender    object
Salary    object
dtype: object

let's remove leading $ and convert Salary to int:

In [182]: df.Salary = df.Salary.str.lstrip('$').astype(int)

In [183]: df.dtypes
Out[183]:
Gender    object
Salary     int32
dtype: object

and your Gender column to categorical:

In [186]: df.Gender = df.Gender.astype('category')

In [187]: df.dtypes
Out[187]:
Gender    category
Salary       int32
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.