0

I'm trying to encode the non-numeric columns of a pandas df to numeric values. I'm using

df = df.fillna('0')
msk = np.random.rand(len(df)) < 0.8
df_train = df[msk]
df_test = df[~msk]
columns_to_encode = df.select_dtypes(exclude=[np.number]).columns
encoder_dict = {col: LabelEncoder() for col in columns_to_encode }
df_train_enc = df_train
df_test_enc = df_test
for col in columns_to_encode:
    encoder_dict[col].fit_transform(df_train_enc[col])

This, however, throws an error TypeError: '<' not supported between instances of 'str' and 'float'. What am I missing here? I thought LabelEncoder should be able to transform strings to numerics...

2
  • You might have nan values in your data, see: stackoverflow.com/q/43956705/4121573 Commented Apr 13, 2018 at 10:57
  • I don't, see updated post! Commented Apr 13, 2018 at 10:59

1 Answer 1

4

LabelEncoder works on string labels without an issue, so, in case you have mixed types in your data (due to missing values, for example), you can use:

for col in columns_to_encode:
    encoder_dict[col].fit_transform(df_train_enc[col].astype(str))
Sign up to request clarification or add additional context in comments.

2 Comments

Did you try astype(str)?
Yes, that helped, will accept in 7 mins. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.