0

If I am having a variable with two values (for example Sex can take male or female), I use code like,

train_df["Sex"] = train_df["Sex"].apply(lambda sex: 0 if sex == 'male' else 1)

to convert string to integer. What is the way to do it if the variable takes more than 2 values, like Salary categorised as low/medium/high? How to assign value similarly as above?

1
  • 2
    use a map (dictionary) Commented Dec 20, 2017 at 8:31

2 Answers 2

5

Use map by dictionary:

d = {
    'male': 0,
    'female': 1,
    'other': 2
}

train_df["Sex"] = train_df["Sex"].map(d)

But for Salary is better cut if need new values by ranges:

train_df = pd.DataFrame({'Salary': [100,200,300,500]})


bins = [0, 200, 400, np.inf]
labels=['low','medium','high']
train_df['label'] = pd.cut(train_df['Salary'], bins=bins, labels=labels)
print (train_df)
   Salary   label
0     100     low
1     200     low
2     300  medium
3     500    high
Sign up to request clarification or add additional context in comments.

Comments

1

You can make a transformation dict for example:

values = {
    "low" : 0,
    "med" : 1,
    "high": 2
}
train_df["Sex"] = train_df["Sex"].apply(lambda level: values.get(level, 0))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.