1

I have a code using label then one-hot encoding. After that we are creating a DataFrame. There are still other ways to create column names simply, but I just want to understand these codes below. new_poke_df is the existing dataframe and we are simply concatenating this dataframe with our new features created with one-hot encoding. These new features are; new_gen_features, new_leg_features.

  1. I use sum usually for numeric values but here it is used with string labels. What is the reason and effect of Sum() function in this example
  2. There is also and two square bracket at the end. What is the reason?

I also added the link to my github if anyone wonder the whole codes(https://github.com/ibozkurt79/practical-machine-learning-with-python/blob/master/notebooks/Ch04_Feature_Engineering_and_Selection/Feature%20Engineering%20on%20Categorical%20Data.ipynb )

new_poke_ohe = pd.concat([new_poke_df, new_gen_features, new_leg_features], 
axis=1)    
columns = sum([['Name', 'Generation', 'Gen_Label'], 
           gen_feature_labels,
           ['Legendary', 'Lgnd_Label'], leg_feature_labels], [])    
new_poke_ohe[columns]

1 Answer 1

1

sum(list_of_list, []) is a pythonic way of flattening a list of a list.

See this example:

list_of_list = [['A','B','C'],['D'],['E','F','G','H']]
sum(list_of_list, [])

Output:

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

Notice how your 2D array, list of list, is now a 1D list.

the [] tells sum that the starting object to add on to is an empty list. quoted from @piRSquared

So, what is happening here is that you are building a new list of columns from various smaller list of columns from the dataframes in your pd.concat.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.