Create pandas duplicate rows based on the number of items in a list type column

Question

I have a data frame like this,

  df
  col1     col2
   A        [1]
   B        [1,2]
   A        [2,3,4]
   C        [1,2]
   B        [4]

Now I want to create new rows based on the number of values in the col2 list where the col1 values will be same so the final data frame would look like,

  df
  col1    col2
   A       [1]
   B       [1]
   B       [2]
   A       [2]
   A       [3]
   A       [4]
   C       [1]
   C       [2]
   B       [4]

I am looking for some pandas short cuts to do it more efficiently

I think dupe was wrong (partly only), so reopened.

jezrael
– jezrael

2021-02-18 07:24:55 +00:00
Commented Feb 18, 2021 at 7:24 — jezrael
– jezrael, Commented Feb 18, 2021 at 7:24

jezrael · Accepted Answer · 2021-02-18 07:15:14Z

1

Use DataFrame.explode and then create one element lists:

df2 = df.explode('col2')

df2['col2'] = df2['col2'].apply(lambda x: [x])

Another idea, I hope faster in large data is use numpy np.repeat with chain.from_iterable for flatten values:

from  itertools import chain

df2 = pd.DataFrame({
        "col1": np.repeat(df.col1.to_numpy(), df.col2.str.len()),
        "col2": [[x] for x in chain.from_iterable(df.col2)]})

print (df2)
  col1 col2
0    A  [1]
1    B  [1]
2    B  [2]
3    A  [2]
4    A  [3]
5    A  [4]
6    C  [1]
7    C  [2]
8    B  [4]

edited Feb 18, 2021 at 7:15

answered Feb 18, 2021 at 7:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Create pandas duplicate rows based on the number of items in a list type column

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related