1

I am working on a data frame that has a column with the following:

         Products
1           A;B
2           A
3           D;A;C

I would like to have instead:

          Has_A      Has_B        Has_C   ...
1           1          1            0
2           1          0            0

Also, as a step further, there are some rows that contains something like "No products" or "None" and there is NaNs, I would like to put all these into 1 column (if possible ).

Any tips ? Is it possible to do ?

Thank you

1 Answer 1

2

You can use str.get_dummies mainly:

df = df['Products'].str.get_dummies(';').add_prefix('Has_')
print (df)
   Has_A  Has_B  Has_C  Has_D
0      1      1      0      0
1      1      0      0      0
2      1      0      1      1

Sample:

There is also add solution with replace by dict created with list comprehension and added NaN and None.

df = pd.DataFrame({'Products': ['A;B', 'A', 'D;A;C', 'No prods', np.nan, 'None']})
print (df)
   Products
0       A;B
1         A
2     D;A;C
3  No prods
4       NaN
5      None

L = ['No prods','None']
d = {x :'No product' for x in L + [None, np.nan]}
df['Products'] = df['Products'].replace(d)
df = df['Products'].str.get_dummies(';').add_prefix('Has_')
print (df)
   Has_A  Has_B  Has_C  Has_D  Has_No product
0      1      1      0      0               0
1      1      0      0      0               0
2      1      0      1      1               0
3      0      0      0      0               1
4      0      0      0      0               1
5      0      0      0      0               1
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you Jez it worked ! simple question: can you explain the code for the variable d ? (i know its the one that answers the second part of mu question )
It is dictionary comprehension which create new duct by values from list L for replace another data.
I found that I'm losing the other columns, is there a way to conserve the other columns than "Products" ? df['Products'] = df['Products'].str.get_dummies(';').add_prefix('Has_') didn't work
I think you can use join - df = df_orig.join(df)
Or concat - df = pd.concat([df_orig, df], axis=1)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.