1

I'm trying to replace column "Names" by a new variable "Gender" based on the first letters that we find in column name.

INPUT:

df['Name'].value_counts()

OUTPUT:

Mr. Gordon Hemmings     1
Miss Jane Wilkins       1
Mrs. Audrey North       1
Mrs. Wanda Sharp        1
Mr. Victor Hemmings     1
                       ..
Miss Heather Abraham    1
Mrs. Kylie Hart         1
Mr. Ian Langdon         1
Mr. Gordon Watson       1
Miss Irene Vance        1

Name: Name, Length: 4999, dtype: int64

Now, see the Miss, Mrs., and Miss? The first question that comes to mind is: how many different words there are?

INPUT

df.Name.str.split().str[0].value_counts(dropna=False)

Mr.     3351
Mrs.     937
Miss     711
NaN        1

Name: Name, dtype: int64

Now I'm trying to:

    #Replace missing value

df['Name'].fillna('Mr.', inplace=True)

# Create Column Gender
df['Gender'] = df['Name']

for i in range(0, df[0]):  


    A = df['Name'].values[i][0:3]=="Mr." 
    df['Gender'].values[i] = A

df.loc[df['Gender']==True, 'Gender']="Male"
df.loc[df['Gender']==False, 'Gender']="Female"

del df['Name'] #Delete column 'Name'

df

But I'm missing something since I get the following error:

KeyError: 0

1 Answer 1

1

The KeyError is because you don't have a column called 0. However, I would ditch that code and try something more efficient.

You can use np.where with str.contains to search for names with Mr. after using fillna(). Then, just drop the Name column.:

df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
df

Full example:

df = pd.DataFrame({'Name': {0: 'Mr. Gordon Hemmings',
  1: 'Miss Jane Wilkins',
  2: 'Mrs. Audrey North',
  3: 'Mrs. Wanda Sharp',
  4: 'Mr. Victor Hemmings'},
 'Value': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}})
print(df)
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
print('\n')
print(df)
                  Name  Value
0  Mr. Gordon Hemmings      1
1    Miss Jane Wilkins      1
2    Mrs. Audrey North      1
3     Mrs. Wanda Sharp      1
4  Mr. Victor Hemmings      1


   Value  Gender
0      1    Male
1      1  Female
2      1  Female
3      1  Female
4      1    Male
Sign up to request clarification or add additional context in comments.

2 Comments

That didn't work... I get this: df['Gender'].value_counts() Male 4289 Female 711 Name: Gender, dtype: int64 But that is wrong... It seems he just diferentiated Miss, when he should return true only to "Mr." and False otherwise.
@jps17183 I forgot that . is a regex charachter, so you need to escape it with /.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.