Python 3/Pandas Dataframe Splitting a column in multiple columns with binary values

Question

I'm wondering about the cleanest way to do what I'm facing. i'd like to create a new column for every value of a column in an existing dataframe. I don't know how many values can exist and for every line we would need to put all the values to 0 except if the option was toogled, which in this case would be 1. Ok that might not be easy to understand so I'll try to make an example with pseudo code too:

Let's imagine I have such a DataFrame:

|    Name    ||    Surname   ||    Color    ||    Genre      |

|    Paul    ||    hellppp   ||    Blue     ||    Male     |
|    Erik    ||    meeeeee   ||    Red      ||    Woman    |
|    Igor    ||    plllsss   ||    Green    ||    Male     |

Should become

|    Name    ||    Surname   ||    Red    ||    Blue      |    Green     |    Male      |    Woman      

|    Paul    ||    hellppp   ||    0      ||    1         |    0         |    1         |    0      
|    Erik    ||    meeeeee   ||    1      ||    0         |    0         |    0         |    1 
|    Igor    ||    plllsss   ||    0      ||    0         |    1         |    1         |    0

So basically for now I created an array containing all my qualitative values list so basically this:

qualitative_data = ['Color', 'Genre']

And now I'm willing to do something like:

for x in qualitative_data:
           pass

I don't understant it: df.set_index(['Name','Surname']).pivot_table(index=['Name','Surname'],columns=['Color','Genre'],aggfunc='nunique',fill_value=0).reset_index() vs df.pivot_table(index=['Name','Surname'],columns=['Color','Genre'],aggfunc='nunique',fill_value=0).reset_index().... — ansev
– ansev, Commented Nov 18, 2019 at 16:32
@ansev That would be better posed as its own question. Basically it boils down to how GroupBy.nunique still returns unique counts (guaranteed to be 1) for grouping columns, though it does not do so if a grouping column is part of the index. — ALollz
– ALollz, Commented Nov 18, 2019 at 17:02

ALollz · Accepted Answer · 2019-11-18 16:38:05Z

4

You could use get_dummies:

result = pd.get_dummies(df, columns=['Color', 'Genre'], prefix_sep='', prefix='')

print(result)

Output

   Name  Surname  Blue  Green  Red  Male  Woman
0  Paul  hellppp     1      0    0     1      0
1  Erik  meeeeee     0      0    1     0      1
2  Igor  plllsss     0      1    0     1      0

edited Nov 18, 2019 at 16:38

ALollz

59.7k7 gold badges74 silver badges97 bronze badges

answered Nov 18, 2019 at 16:19

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ansev Over a year ago

please look at my comment in the question, I still don't understand the result of this in the panda library

Dani Mesejo Over a year ago

@ansev What do you mean?

ansev Over a year ago

why does pandas do this? it's not logical

Collectives™ on Stack Overflow

Python 3/Pandas Dataframe Splitting a column in multiple columns with binary values

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related