0

I have this:

df = pd.DataFrame({'my_col' : ['red', 'red', 'green']})

my_col
red
red
green

I want this: df2 = pd.DataFrame({'red' : [True, True, False], 'green' : [False, False, True]})

red  green
True  False
True  False
False   True

Is there an elegant way to do this?

4 Answers 4

1

You can do this:

for color in df['my_col'].unique():
    df[color] = df['my_col'] == color

df2 = df[df['my_col'].unique()]

It will loop over each color in my_col and adds a column to df with the name of the color and True/False whether it is equal to the color. Finally extract df2 from df by selecting only the color columns.

Another option is to start with an empty dataframe for df2 and immediately add the columns to this dataframe:

df2 = pd.DataFrame()
for color in df['my_col'].unique():
    df2[color] = df['my_col'] == color

Output:

     red  green
0   True  False
1   True  False
2  False   True
Sign up to request clarification or add additional context in comments.

Comments

1

Python functionality get_dummies can work for this.

import pandas as pd
import numpy as np

df = pd.DataFrame({'my_col': ['red', 'red', 'green']})
new_df = pd.get_dummies(df, dtype=bool)
new_df[:] = np.where(pd.get_dummies(df, dtype=bool), 'True', 'False')

new_df.rename(columns={'my_col_green': 'green', 'my_col_red': 'red'}, inplace=True)
print(new_df)

Comments

1

Considering that the original dataframe is df, one can use:

  1. pandas.get_dummies

  2. pandas.Series.str.get_dummies


Option 1

Using pandas.get_dummies, one can do the following

df2 = pd.get_dummies(df['my_col'], dtype=bool)

[Out]:

   green    red
0  False   True
1  False   True
2   True  False

If one wants the column red to appear first, a one-liner would look like the following

df2 = pd.get_dummies(df['my_col'], dtype=bool)[['red', 'green']]

[Out]:

     red  green
0   True  False
1   True  False
2  False   True

Option 2

Using pandas.Series.str.get_dummies, one can do the following

df2 = df['my_col'].str.get_dummies().astype(bool)

[Out]:

   green    red
0  False   True
1  False   True
2   True  False

If one wants the column red to appear first, a one-liner would look like the following

df2 = df['my_col'].str.get_dummies().astype(bool)[['red', 'green']]

[Out]:

     red  green
0   True  False
1   True  False
2  False   True

Comments

0
# reset index, to keep the rows count
df=df.reset_index()

# create a cross tab (don't miss negation for the resultset)
~(pd.crosstab(index=[df['index'],df['my_col']], 
             columns=df['my_col'])
 .reset_index()                  # cleanup to match the output
 .drop(columns=['index','my_col']) # drop unwanted columns
 .rename_axis(columns=None)        # remove axis name
 .astype(bool))                    # make it boolean
    green   red
0   True    False
1   True    False
2   False   True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.