I'm trying to pre-process some data for machine learning purposes. I'm currently trying to clean up some NaN values and replace them with 'unknown' and a prefix or suffix which is based on the column name.
The reason for this is when I'm use one hot encoding, I can't have multiple columns with the same name being fed into xgboost.
So what I have is the following
df = df.apply(lambda x: x.replace(np.nan, 'unknown'))
And I'd like to replace all instances of NaN in the df with 'unknown_columname'. Is there any easy or simple way to do this?
df = df.apply(lambda x: x.replace(np.nan, f'unknown_{x.name}')). You can also usedf = df.apply(lambda x: x.fillna(f'unknown_{x.name}'))