3

I have some dataset about genders of various individuals. Say, the dataset looks like this:

Male
Female
Male and Female
Male
Male
Female
Trans
Unknown
Male and Female

Some identify themselves as Male, some female and some identify themselves as both male and female.

Now, what I want to do is create a new column in Pandas which maps

Males to 1, 
Females to 2,
Others to 3

I wrote some code

def gender(x):
    if x.str.contains("Male")
        return 1
    elif x.str.contains("Female")
        return 2
    elif return 3

df["Gender Values"] = df["Gender"].apply(gender)

But I was getting errors that function doesn't contain any attribute contains. I tried removing str:

x.contains("Male")

and I was getting same error

Is there a better way to do this?

3 Answers 3

11

You can use:

def gender(x):
    if "Female" in x and "Male" in x:
        return 3
    elif "Male" in x:
        return 1
    elif "Female" in x:
        return 2
    else: return 4

df["Gender Values"] = df["Gender"].apply(gender)

print (df)
            Gender  Gender Values
0             Male              1
1           Female              2
2  Male and Female              3
3             Male              1
4             Male              1
5           Female              2
6            Trans              4
7          Unknown              4
8  Male and Female              3
Sign up to request clarification or add additional context in comments.

1 Comment

Found this from Google and this is the best solution I could find! Thank you!
1

Create a mapping function, and use that to map the values.

def map_identity(identity):
    if gender.lower() == 'male':
        return 1
    elif gender.lower() == 'female':
        return 2
    else: 
        return 3

df["B"] = df["A"].map(map_identity)

Comments

0

If there is no specific requirement to use 1, 2, 3 to Males, Females and Others respectively in that order, you can try LabelEncoder from Scikit-Learn. It will randomly allocate a unique number to each unique category in that column.

from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
encoder.fit(df["gender"])

For details, you can check Label Encoder documentation.

Hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.