Create multiple columns from a single column

Question

I am working on a data frame that has a column with the following:

         Products
1           A;B
2           A
3           D;A;C

I would like to have instead:

          Has_A      Has_B        Has_C   ...
1           1          1            0
2           1          0            0

Also, as a step further, there are some rows that contains something like "No products" or "None" and there is NaNs, I would like to put all these into 1 column (if possible ).

Any tips ? Is it possible to do ?

Thank you

jezrael · Accepted Answer · 2017-04-27 09:52:09Z

2

You can use str.get_dummies mainly:

df = df['Products'].str.get_dummies(';').add_prefix('Has_')
print (df)
   Has_A  Has_B  Has_C  Has_D
0      1      1      0      0
1      1      0      0      0
2      1      0      1      1

Sample:

There is also add solution with replace by dict created with list comprehension and added NaN and None.

df = pd.DataFrame({'Products': ['A;B', 'A', 'D;A;C', 'No prods', np.nan, 'None']})
print (df)
   Products
0       A;B
1         A
2     D;A;C
3  No prods
4       NaN
5      None

L = ['No prods','None']
d = {x :'No product' for x in L + [None, np.nan]}
df['Products'] = df['Products'].replace(d)
df = df['Products'].str.get_dummies(';').add_prefix('Has_')
print (df)
   Has_A  Has_B  Has_C  Has_D  Has_No product
0      1      1      0      0               0
1      1      0      0      0               0
2      1      0      1      1               0
3      0      0      0      0               1
4      0      0      0      0               1
5      0      0      0      0               1

edited Apr 27, 2017 at 9:52

answered Apr 27, 2017 at 9:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

datascana Over a year ago

Thank you Jez it worked ! simple question: can you explain the code for the variable d ? (i know its the one that answers the second part of mu question )

jezrael Over a year ago

It is dictionary comprehension which create new duct by values from list L for replace another data.

datascana Over a year ago

I found that I'm losing the other columns, is there a way to conserve the other columns than "Products" ? df['Products'] = df['Products'].str.get_dummies(';').add_prefix('Has_') didn't work

jezrael Over a year ago

I think you can use join - df = df_orig.join(df)

jezrael Over a year ago

Or concat - df = pd.concat([df_orig, df], axis=1)

|

Collectives™ on Stack Overflow

Create multiple columns from a single column

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related