How to transform list of variable length from one python dataframe column into rows?

Question

I am trying to transform lists from one dataframe column into rows but not sure how to efficiently do that in python? My actual data has thousands of rows and lists of variable lengths (in column Specs but to simplify, I will use the example below.

import pandas as pd
data = [{'Type': 'A', 'Specs': [['a1', 50], ['a2', 14]]},
   {'Type': 'B', 'Specs': [['b1', 20], ['b2', 25], ['b3', 15], ['b4', 10]]},
   {'Type': 'C', 'Specs': [['c1', 32]]} ]
df = pd.DataFrame(data)

The final result should be equivalent to the output from the dataframe below

data_out= [{'Type': 'A', 'model':'a1', 'qty': 50},
   {'Type': 'A', 'model':'a2', 'qty': 14},
   {'Type': 'B', 'model':'b1', 'qty': 20},
   {'Type': 'B', 'model':'b2', 'qty': 25},
   {'Type': 'B', 'model':'b3', 'qty': 15},
   {'Type': 'B', 'model':'b4', 'qty': 10},
   {'Type': 'C', 'model':'c1', 'qty': 32}]
df_out = pd.DataFrame(data_out)

I have tried to use apply with a function to convert each row list/value to a dataframe and getting confused how to return a dataframe for each row and expand the new dataframe with the new rows. Please let me know if I am on the wrong track and what would be the most efficient way to get the required dataframe output on large data? Thanks

def convert_list(my_list):
   my_df = pd.DataFrame(pv_list, columns=['model', 'qty'])
return my_df

df[['model', 'qty']] = df['Specs'].apply(convert_list)

Anurag Dabas · Accepted Answer · 2021-08-01 14:32:17Z

2

you can use explode()+join()+DataFrame()+pop():

df=df.explode('Specs',ignore_index=True)
df[['Model','Qty']]=pd.DataFrame(df.pop('Specs').tolist())
#OR
#df=df.join(pd.DataFrame(df.pop('Specs').tolist(),columns=['Model','Qty']))

OR

explode()+drop()+.str accessor:

df=df.explode('Specs',ignore_index=True)
df['Model']=df['Specs'].str[0]
df['Qty']=df['Specs'].str[1]
df=df.drop('Specs',1)

OR

explode()+pop()+agg():

df=df.explode('Specs',ignore_index=True)
df[['Model','Qty']]=df.pop('Specs').agg(pd.Series)

output of df:

   Type     Model   Qty
0   A       a1      50
1   A       a2      14
2   B       b1      20
3   B       b2      25
4   B       b3      15
5   B       b4      10
6   C       c1      32

edited Aug 1, 2021 at 14:32

answered Aug 1, 2021 at 14:20

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kuldeep Singh Sidhu · Accepted Answer · 2021-08-02 04:20:37Z

1

You don't have to write any custom function or joins/merge, use explode

DOCS: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

import pandas as pd
data = [{'Type': 'A', 'Specs': [['a1', 50], ['a2', 14]]},
   {'Type': 'B', 'Specs': [['b1', 20], ['b2', 25], ['b3', 15], ['b4', 10]]},
   {'Type': 'C', 'Specs': [['c1', 32]]} ]
df = pd.DataFrame(data)

# Code Example
df=df.explode('Specs').reset_index(drop=True)
df[['model','qty']] =  pd.DataFrame(df["Specs"].to_list())
df.drop('Specs', axis=1, inplace=True)
df

|    | Type   | model   |   qty |
|---:|:-------|:--------|------:|
|  0 | A      | a1      |    50 |
|  1 | A      | a2      |    14 |
|  2 | B      | b1      |    20 |
|  3 | B      | b2      |    25 |
|  4 | B      | b3      |    15 |
|  5 | B      | b4      |    10 |
|  6 | C      | c1      |    32 |

PS: that's it, if it is still slow, I would advise looking at something that is parallelised!

edited Aug 2, 2021 at 4:20

answered Aug 1, 2021 at 14:21

Kuldeep Singh Sidhu

3,8762 gold badges16 silver badges23 bronze badges

1 Comment

Wyse09 Over a year ago

Your code example was fast enough, so I won't need to look into the parallel process at least for now. Thanks

BENY · Accepted Answer · 2021-08-01 14:58:41Z

0

In your case try

s = df.pop('Specs').explode()
pd.DataFrame(s.tolist(),columns=['Model','Qty'],index=s.index).join(df)
Out[84]: 
  Model  Qty Type
0    a1   50    A
0    a2   14    A
1    b1   20    B
1    b2   25    B
1    b3   15    B
1    b4   10    B
2    c1   32    C

answered Aug 1, 2021 at 14:58

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

How to transform list of variable length from one python dataframe column into rows?

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related