1

I have a data frame like this,

  df
  col1     col2
   A        [1]
   B        [1,2]
   A        [2,3,4]
   C        [1,2]
   B        [4]

Now I want to create new rows based on the number of values in the col2 list where the col1 values will be same so the final data frame would look like,

  df
  col1    col2
   A       [1]
   B       [1]
   B       [2]
   A       [2]
   A       [3]
   A       [4]
   C       [1]
   C       [2]
   B       [4]

I am looking for some pandas short cuts to do it more efficiently

1
  • I think dupe was wrong (partly only), so reopened. Commented Feb 18, 2021 at 7:24

1 Answer 1

1

Use DataFrame.explode and then create one element lists:

df2 = df.explode('col2')

df2['col2'] = df2['col2'].apply(lambda x: [x])

Another idea, I hope faster in large data is use numpy np.repeat with chain.from_iterable for flatten values:

from  itertools import chain

df2 = pd.DataFrame({
        "col1": np.repeat(df.col1.to_numpy(), df.col2.str.len()),
        "col2": [[x] for x in chain.from_iterable(df.col2)]})

print (df2)
  col1 col2
0    A  [1]
1    B  [1]
2    B  [2]
3    A  [2]
4    A  [3]
5    A  [4]
6    C  [1]
7    C  [2]
8    B  [4]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.