Pandas: Convert lists within a single column to multiple columns

Question

I have a dataframe that includes columns with multiple attributes separated by commas:

df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]})

   id   labels
0   1   a,b,c
1   2   c,a
2   3   d,a,b

(I know this isn't an ideal situation, but the data originates from an external source.) I want to turn the multi-attribute columns into multiple columns, one for each label, so that I can treat them as categorical variables. Desired output:

    id  a       b       c       d   
0    1  True    True    True    False   
1    2  True    False   True    False   
2    3  True    True    False   True

I can get the set of all possible attributes ([a,b,c,d]) fairly easily, but cannot figure out a way to determine whether a given row has a particular attribute without row-by-row iteration for each attribute. Is there a better way to do this?

jezrael · Accepted Answer · 2016-05-16 20:14:46Z

9

You can use get_dummies, cast 1 and 0 to boolean by astype and last concat column id:

print df['labels'].str.get_dummies(sep=',').astype(bool)
      a      b      c      d
0  True   True   True  False
1  True  False   True  False
2  True   True  False   True

print pd.concat([df.id, df['labels'].str.get_dummies(sep=',').astype(bool)], axis=1)

   id     a      b      c      d
0   1  True   True   True  False
1   2  True  False   True  False
2   3  True   True  False   True

answered May 16, 2016 at 20:14

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas: Convert lists within a single column to multiple columns

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related