I'm trying to replace column "Names" by a new variable "Gender" based on the first letters that we find in column name.
INPUT:
df['Name'].value_counts()
OUTPUT:
Mr. Gordon Hemmings 1
Miss Jane Wilkins 1
Mrs. Audrey North 1
Mrs. Wanda Sharp 1
Mr. Victor Hemmings 1
..
Miss Heather Abraham 1
Mrs. Kylie Hart 1
Mr. Ian Langdon 1
Mr. Gordon Watson 1
Miss Irene Vance 1
Name: Name, Length: 4999, dtype: int64
Now, see the Miss, Mrs., and Miss? The first question that comes to mind is: how many different words there are?
INPUT
df.Name.str.split().str[0].value_counts(dropna=False)
Mr. 3351
Mrs. 937
Miss 711
NaN 1
Name: Name, dtype: int64
Now I'm trying to:
#Replace missing value
df['Name'].fillna('Mr.', inplace=True)
# Create Column Gender
df['Gender'] = df['Name']
for i in range(0, df[0]):
A = df['Name'].values[i][0:3]=="Mr."
df['Gender'].values[i] = A
df.loc[df['Gender']==True, 'Gender']="Male"
df.loc[df['Gender']==False, 'Gender']="Female"
del df['Name'] #Delete column 'Name'
df
But I'm missing something since I get the following error:
KeyError: 0