2

I have a df as

name category   dummy
USA   fx,ft,fe   1
INDIA fx         13

I need to convert this as

name category_fx categoty_ft category_fe  dummy 
USA  True         True        True         1
INDIA True        False       False        13

tried with series.explode() function but not getting this output.

2 Answers 2

2

Use Series.str.get_dummies by column category with converting 0,1 to boolean by DataFrame.astype and DataFrame.add_prefix:

c = df.columns.difference(['category'], sort=False).tolist()
df = (df.set_index(c)['category']
        .str.get_dummies(',')
        .astype(bool)
        .add_prefix('category_')
        .reset_index())
print (df)
    name  category_fe  category_ft  category_fx
0    USA         True         True         True
1  INDIA        False        False         True

EDIT: If need replace one column by multiple columns you can use:

df1 = (df['category']
        .str.get_dummies(',')
        .astype(bool)
        .add_prefix('category_'))

pos = df.columns.get_loc('category')
df = pd.concat([df.iloc[:, :pos], df1, df.iloc[:, pos+1:]], axis=1)
print (df)
    name  category_fe  category_ft  category_fx  dummy
0    USA         True         True         True      1
1  INDIA        False        False         True     13

This solution is modifid for multiple columns:

print (df)
    name  category  dummy category1
0    USA  fx,ft,fe      1       a,f
1  INDIA        fx     13       s,a

cols = ['category','category1']

dfs = [(df[c].str.get_dummies(',').astype(bool).add_prefix(f'{c}_')) for c in cols]

df = pd.concat([df, *dfs], axis=1).drop(cols, axis=1)
print (df)
    name  dummy  category_fe  category_ft  category_fx  category1_a  \
0    USA      1         True         True         True         True   
1  INDIA     13        False        False         True         True   

   category1_f  category1_s  
0         True        False  
1        False         True   
Sign up to request clarification or add additional context in comments.

6 Comments

This is ignoring if there is other columns present in the df, can we have other columns present there as well..
@Kowsi - answer was edited.
@Kowsi - added ouput like need in question.
@Kowsi - so dont need dummy like last column?
what if there are other columns as category, like the same one i need to do for some other columns as well..
|
2

You can use str.get_dummies and astype(bool) to convert your strings to new columns of booleans, then add_prefix to change the column names, and finally join:

df2 = (df.drop(columns='category)
         .join(df['category']
              .str.get_dummies(sep=',')
              .astype(bool)
              .add_prefix('category_')
              )
      )

or, for modification of the original dataframe:

df = df.join(df.pop('category')
               .str.get_dummies(sep=',')
               .astype(bool)
               .add_prefix('category_'))

output:

    name  category_fe  category_ft  category_fx
0    USA         True         True         True
1  INDIA        False        False         True
generalization to more columns

assuming this input:

    name category1 category2  dummy
0    USA  fx,ft,fe     a,b,c      1
1  INDIA        fx         d     13
cats = df.filter(like='category').columns
cols = list(df.columns.difference(cats))
(df
 .set_index(cols)
 .stack()
 .str.get_dummies(sep=',')
 .groupby(level=cols).max().astype(bool)
 .reset_index()
)

output:

   dummy   name      a      b      c      d     fe     ft    fx
0      1    USA   True   True   True  False   True   True  True
1     13  INDIA  False  False  False   True  False  False  True

6 Comments

@Kowsi this works with any other column present
No, other columns are not showing.. Only name and new category columns are showing
in the second one it did and I edited the first one to handle it as you want (before, you needed to pass the list of columns to keep)
does it handle the other columns? like other columns with sep as , does it seperates into new one/
@Kowsi no, if you want to handle more columns you can first melt, then get_dummies, then pivot (or stack/unstack if using index). But this is a different question IMO.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.