Hello I am following a video on Udemy. We are trying to apply a random forest classifier. Before we do so, we convert one of the columns in a data frame into a string. The 'Cabin' column represents values such as "4C" but in order to reduce the number of unique values, we want to use simply the first number to map onto a new column 'Cabin_mapped'.
data['Cabin_mapped'] = data['Cabin'].astype(str).str[0]
# this transforms the letters into numbers
cabin_dict = {k:i for i, k in enumerate(
data['Cabin_mapped'].unique(),0)}
data.loc[:,'Cabin_mapped'] = data.loc[:,'Cabin_mapped'].map(cabin_dict)
data[['Cabin_mapped', 'Cabin']].head()
This part below is simply splitting the data into training and test set. The parameters don't really matter for figuring out the problem.
X_train_less_cat, X_test_less_cat, y_train, y_test = \
train_test_split(data[use_cols].fillna(0), data.Survived,
test_size = 0.3, random_state=0)
I get an error here after the fit, saying I could not convert the string into a float. rf = RandomForestClassifier(n_estimators=200, random_state=39) rf.fit(X_train_less_cat, y_train)
It seems like I need to convert one of the inputs back into float to use the random forest algorithms. This is despite the error not showing up in the tutorial video. If anyone could help me out, that'd be great.
