Create new column in pandas based on value of another column

Question

I have some dataset about genders of various individuals. Say, the dataset looks like this:

Male
Female
Male and Female
Male
Male
Female
Trans
Unknown
Male and Female

Some identify themselves as Male, some female and some identify themselves as both male and female.

Now, what I want to do is create a new column in Pandas which maps

Males to 1, 
Females to 2,
Others to 3

I wrote some code

def gender(x):
    if x.str.contains("Male")
        return 1
    elif x.str.contains("Female")
        return 2
    elif return 3

df["Gender Values"] = df["Gender"].apply(gender)

But I was getting errors that function doesn't contain any attribute contains. I tried removing str:

x.contains("Male")

and I was getting same error

Is there a better way to do this?

jezrael · Accepted Answer · 2016-09-19 05:51:13Z

11

You can use:

def gender(x):
    if "Female" in x and "Male" in x:
        return 3
    elif "Male" in x:
        return 1
    elif "Female" in x:
        return 2
    else: return 4

df["Gender Values"] = df["Gender"].apply(gender)

print (df)
            Gender  Gender Values
0             Male              1
1           Female              2
2  Male and Female              3
3             Male              1
4             Male              1
5           Female              2
6            Trans              4
7          Unknown              4
8  Male and Female              3

answered Sep 19, 2016 at 5:51

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Emac Over a year ago

Found this from Google and this is the best solution I could find! Thank you!

Batman · Accepted Answer · 2016-09-19 02:28:13Z

1

Create a mapping function, and use that to map the values.

def map_identity(identity):
    if gender.lower() == 'male':
        return 1
    elif gender.lower() == 'female':
        return 2
    else: 
        return 3

df["B"] = df["A"].map(map_identity)

answered Sep 19, 2016 at 2:28

Batman

9,0177 gold badges48 silver badges87 bronze badges

Comments

Rajarshi Das · Accepted Answer · 2020-05-27 11:40:11Z

0

If there is no specific requirement to use 1, 2, 3 to Males, Females and Others respectively in that order, you can try LabelEncoder from Scikit-Learn. It will randomly allocate a unique number to each unique category in that column.

from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
encoder.fit(df["gender"])

For details, you can check Label Encoder documentation.

Hope this helps!

answered May 27, 2020 at 11:40

Rajarshi Das

12 bronze badges

Collectives™ on Stack Overflow

Create new column in pandas based on value of another column

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related