I have a code using label then one-hot encoding. After that we are creating a DataFrame. There are still other ways to create column names simply, but I just want to understand these codes below. new_poke_df is the existing dataframe and we are simply concatenating this dataframe with our new features created with one-hot encoding. These new features are; new_gen_features, new_leg_features.
- I use sum usually for numeric values but here it is used with string labels. What is the reason and effect of Sum() function in this example
- There is also and two square bracket at the end. What is the reason?
I also added the link to my github if anyone wonder the whole codes(https://github.com/ibozkurt79/practical-machine-learning-with-python/blob/master/notebooks/Ch04_Feature_Engineering_and_Selection/Feature%20Engineering%20on%20Categorical%20Data.ipynb )
new_poke_ohe = pd.concat([new_poke_df, new_gen_features, new_leg_features],
axis=1)
columns = sum([['Name', 'Generation', 'Gen_Label'],
gen_feature_labels,
['Legendary', 'Lgnd_Label'], leg_feature_labels], [])
new_poke_ohe[columns]