I'm trying to encode the non-numeric columns of a pandas df to numeric values. I'm using
df = df.fillna('0')
msk = np.random.rand(len(df)) < 0.8
df_train = df[msk]
df_test = df[~msk]
columns_to_encode = df.select_dtypes(exclude=[np.number]).columns
encoder_dict = {col: LabelEncoder() for col in columns_to_encode }
df_train_enc = df_train
df_test_enc = df_test
for col in columns_to_encode:
encoder_dict[col].fit_transform(df_train_enc[col])
This, however, throws an error TypeError: '<' not supported between instances of 'str' and 'float'. What am I missing here? I thought LabelEncoder should be able to transform strings to numerics...
nanvalues in your data, see: stackoverflow.com/q/43956705/4121573