1

I'm wondering about the cleanest way to do what I'm facing. i'd like to create a new column for every value of a column in an existing dataframe. I don't know how many values can exist and for every line we would need to put all the values to 0 except if the option was toogled, which in this case would be 1. Ok that might not be easy to understand so I'll try to make an example with pseudo code too:

Let's imagine I have such a DataFrame:

|    Name    ||    Surname   ||    Color    ||    Genre      |

|    Paul    ||    hellppp   ||    Blue     ||    Male     |
|    Erik    ||    meeeeee   ||    Red      ||    Woman    |
|    Igor    ||    plllsss   ||    Green    ||    Male     |

Should become

|    Name    ||    Surname   ||    Red    ||    Blue      |    Green     |    Male      |    Woman      

|    Paul    ||    hellppp   ||    0      ||    1         |    0         |    1         |    0      
|    Erik    ||    meeeeee   ||    1      ||    0         |    0         |    0         |    1 
|    Igor    ||    plllsss   ||    0      ||    0         |    1         |    1         |    0 

So basically for now I created an array containing all my qualitative values list so basically this:

qualitative_data = ['Color', 'Genre']

And now I'm willing to do something like:

for x in qualitative_data:
           pass

2
  • I don't understant it: df.set_index(['Name','Surname']).pivot_table(index=['Name','Surname'],columns=['Color','Genre'],aggfunc='nunique',fill_value=0).reset_index() vs df.pivot_table(index=['Name','Surname'],columns=['Color','Genre'],aggfunc='nunique',fill_value=0).reset_index().... Commented Nov 18, 2019 at 16:32
  • @ansev That would be better posed as its own question. Basically it boils down to how GroupBy.nunique still returns unique counts (guaranteed to be 1) for grouping columns, though it does not do so if a grouping column is part of the index. Commented Nov 18, 2019 at 17:02

1 Answer 1

4

You could use get_dummies:

result = pd.get_dummies(df, columns=['Color', 'Genre'], prefix_sep='', prefix='')

print(result)

Output

   Name  Surname  Blue  Green  Red  Male  Woman
0  Paul  hellppp     1      0    0     1      0
1  Erik  meeeeee     0      0    1     0      1
2  Igor  plllsss     0      1    0     1      0
Sign up to request clarification or add additional context in comments.

3 Comments

please look at my comment in the question, I still don't understand the result of this in the panda library
@ansev What do you mean?
why does pandas do this? it's not logical

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.