2

Let's say I have the following df

  x
1 ['abc','bac','cab']
2 ['bac']
3 ['abc','cab']

And I would like to take each element of each list and put it into a new row, like so

  abc bac cab
1  1    1  1
2  0    1  0
3  1    0  1

I have referred to multiple links but can't seem to get this correctly.

Thanks!

1
  • kindly share the source code : df.to_dict() Commented Jul 21, 2021 at 1:11

2 Answers 2

3

One approach with str.join + str.get_dummies:

out = df['x'].str.join(',').str.get_dummies(',')

out:

   abc  bac  cab
0    1    1    1
1    0    1    0
2    1    0    1

Or with explode + pd.get_dummies then groupby max:

out = pd.get_dummies(df['x'].explode()).groupby(level=0).max()

out:

   abc  bac  cab
0    1    1    1
1    0    1    0
2    1    0    1

Can also do pd.crosstab after explode if want counts instead of dummies:

s = df['x'].explode()
out = pd.crosstab(s.index, s)

out:

x      abc  bac  cab
row_0               
0        1    1    1
1        0    1    0
2        1    0    1

*Note output is the same here, but will be count if there are duplicates.


DataFrame:

import pandas as pd

df = pd.DataFrame({
    'x': [['abc', 'bac', 'cab'], ['bac'], ['abc', 'cab']]
})
Sign up to request clarification or add additional context in comments.

Comments

1

I will do

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

s = pd.DataFrame(mlb.fit_transform(df['x']), columns=mlb.classes_, index=df.index)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.