4

I have the following pandas dataframe:

import pandas as pd
df = pd.read_csv("filename.csv")

df 
     A   B         C         D        E    
0    a  0.469112 -0.282863 -1.509059  cat  
1    c -1.135632  1.212112 -0.173215  dog   
2    e  0.119209 -1.044236 -0.861849  dog   
3    f -2.104569 -0.494929  1.071804  bird   
4    g -2.224569 -0.724929  2.234213  elephant
...

I would like to create more columns based on the identity of categorical values in column E such that the dataframe looks like this:

 df 
         A   B         C         D        cat    dog     bird    elephant ....    
    0    a  0.469112 -0.282863 -1.509059  -1      0       0       0
    1    c -1.135632  1.212112 -0.173215   0     -1       0       0
    2    e  0.119209 -1.044236 -0.861849   0     -1       0       0
    3    f -2.104569 -0.494929  1.071804   0      0      -1       0
    4    g -2.224569 -0.724929  2.234213   0      0       0       0
    ...

That is, I pivot the values for column E to be a binary matrix based on the values of E, giving 1 if the value exists, and 0 for all others where it doesn't (here, I would like it to be -1 or a "negative binary matrix")?

I'm not sure which function in pandas best does this: maybe pandas.DataFrame.unstack()?

Any insight appreciated!

1 Answer 1

8

use pd.concat, drop, and get_dummies

pd.concat([df.drop('E', 1), pd.get_dummies(df.E).mul(-1)], axis=1)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Very fast and practicle!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.