from list variable to columns in pandas

Question

I have a Pandas Dataframe that looks like this :

user    items
1       ["product1", "product2", "product3"]
2       ["product5", "product7", "product2"]
3       ["product1", "product4", "product5"]

I have 2 millions users that each have a list of 100 products. I need to transform my Dataframe this way :

user    item_1        item_2        item_3
1       "product1"    "product2"    "product3"
2       "product5"    "product7"    "product2"
3       "product1"    "product4"    "product5"

Does anyone have a "pythonic", quick way to do so ? I don't want to go through for loops, it takes too much time.

Thank you

piRSquared · Accepted Answer · 2017-06-16 19:18:26Z

3

You can reconstruct with df['items'].values.tolist() and join.
I went this direction because it's faster than apply.

Considering the large size of your data, you'll want this instead.

df.drop('items', 1).join(
    pd.DataFrame(df['items'].values.tolist(), df.index).rename(
        columns=lambda x: 'item_{}'.format(x + 1)
    )
)

   user    item_1    item_2    item_3
0     1  product1  product2  product3
1     2  product5  product7  product2
2     3  product1  product4  product5

We can shave a bit of time off of this with

items_array = np.array(df['items'].values.tolist())
cols = np.core.defchararray.add(
    'item_', np.arange(1, items_array.shape[1] + 1).astype(str)
)
pd.DataFrame(
    np.column_stack([df['user'].values, items_array]),
    columns=np.append('user', cols)
)

Timing

%timeit df[['user']].join(df['items'].apply(pd.Series).add_prefix('item_'))
%timeit df.drop('items', 1).join(pd.DataFrame(df['items'].values.tolist(), df.index).rename(columns=lambda x: 'item_{}'.format(x + 1)))

1000 loops, best of 3: 1.8 ms per loop
1000 loops, best of 3: 1.34 ms per loop

%%timeit
items_array = np.array(df['items'].values.tolist())
cols = np.core.defchararray.add(
    'item_', np.arange(1, items_array.shape[1] + 1).astype(str)
)
pd.DataFrame(
    np.column_stack([df['user'].values, items_array]),
    columns=np.append('user', cols)
)

10000 loops, best of 3: 188 µs per loop

larger data

n = 20000
items = ['A%s' % i for i in range(1000)]
df = pd.DataFrame(dict(
        user=np.arange(n),
        items=np.random.choice(items, (n, 100)).tolist()
    ))

%timeit df[['user']].join(df['items'].apply(pd.Series).add_prefix('item_'))
%timeit df.drop('items', 1).join(pd.DataFrame(df['items'].values.tolist(), df.index).rename(columns=lambda x: 'item_{}'.format(x + 1)))

1 loop, best of 3: 3.22 s per loop
1 loop, best of 3: 492 ms per loop

%%timeit
items_array = np.array(df['items'].values.tolist())
cols = np.core.defchararray.add(
    'item_', np.arange(1, items_array.shape[1] + 1).astype(str)
)
pd.DataFrame(
    np.column_stack([df['user'].values, items_array]),
    columns=np.append('user', cols)
)

1 loop, best of 3: 389 ms per loop

edited Jun 16, 2017 at 19:18

answered Jun 16, 2017 at 18:08

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mohamed AL ANI Over a year ago

I tried it on 200 lines, it works. Both methods took too much time and I needed to go. i'll run this tomorrow and come back to tell you the run time. Btw, I actually have 100 products, not 30

Mohamed AL ANI Over a year ago

Well it took 2.25 seconds. Thank you very much ! :)

Abdou · Accepted Answer · 2017-06-16 18:12:27Z

3

You can try:

df[['user']].join(df['items'].apply(pd.Series).add_prefix('item_'))

Should yield:

#    user    item_0    item_1    item_2
# 0     1  product1  product2  product3
# 1     2  product5  product7  product2
# 2     3  product1  product4  product5

I hope this helps.

answered Jun 16, 2017 at 18:12

Abdou

13.3k4 gold badges44 silver badges42 bronze badges

1 Comment

Mohamed AL ANI Over a year ago

Thanks Abdou ! :)

Collectives™ on Stack Overflow

from list variable to columns in pandas

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related