1

I am trying to transform lists from one dataframe column into rows but not sure how to efficiently do that in python? My actual data has thousands of rows and lists of variable lengths (in column Specs but to simplify, I will use the example below.

import pandas as pd
data = [{'Type': 'A', 'Specs': [['a1', 50], ['a2', 14]]},
   {'Type': 'B', 'Specs': [['b1', 20], ['b2', 25], ['b3', 15], ['b4', 10]]},
   {'Type': 'C', 'Specs': [['c1', 32]]} ]
df = pd.DataFrame(data)

The final result should be equivalent to the output from the dataframe below

data_out= [{'Type': 'A', 'model':'a1', 'qty': 50},
   {'Type': 'A', 'model':'a2', 'qty': 14},
   {'Type': 'B', 'model':'b1', 'qty': 20},
   {'Type': 'B', 'model':'b2', 'qty': 25},
   {'Type': 'B', 'model':'b3', 'qty': 15},
   {'Type': 'B', 'model':'b4', 'qty': 10},
   {'Type': 'C', 'model':'c1', 'qty': 32}]
df_out = pd.DataFrame(data_out)

I have tried to use apply with a function to convert each row list/value to a dataframe and getting confused how to return a dataframe for each row and expand the new dataframe with the new rows. Please let me know if I am on the wrong track and what would be the most efficient way to get the required dataframe output on large data? Thanks

def convert_list(my_list):
   my_df = pd.DataFrame(pv_list, columns=['model', 'qty'])
return my_df

df[['model', 'qty']] = df['Specs'].apply(convert_list)

3 Answers 3

2

you can use explode()+join()+DataFrame()+pop():

df=df.explode('Specs',ignore_index=True)
df[['Model','Qty']]=pd.DataFrame(df.pop('Specs').tolist())
#OR
#df=df.join(pd.DataFrame(df.pop('Specs').tolist(),columns=['Model','Qty']))

OR

explode()+drop()+.str accessor:

df=df.explode('Specs',ignore_index=True)
df['Model']=df['Specs'].str[0]
df['Qty']=df['Specs'].str[1]
df=df.drop('Specs',1)

OR

explode()+pop()+agg():

df=df.explode('Specs',ignore_index=True)
df[['Model','Qty']]=df.pop('Specs').agg(pd.Series)

output of df:

   Type     Model   Qty
0   A       a1      50
1   A       a2      14
2   B       b1      20
3   B       b2      25
4   B       b3      15
5   B       b4      10
6   C       c1      32
Sign up to request clarification or add additional context in comments.

Comments

1

You don't have to write any custom function or joins/merge, use explode

DOCS: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

import pandas as pd
data = [{'Type': 'A', 'Specs': [['a1', 50], ['a2', 14]]},
   {'Type': 'B', 'Specs': [['b1', 20], ['b2', 25], ['b3', 15], ['b4', 10]]},
   {'Type': 'C', 'Specs': [['c1', 32]]} ]
df = pd.DataFrame(data)
# Code Example
df=df.explode('Specs').reset_index(drop=True)
df[['model','qty']] =  pd.DataFrame(df["Specs"].to_list())
df.drop('Specs', axis=1, inplace=True)
df
|    | Type   | model   |   qty |
|---:|:-------|:--------|------:|
|  0 | A      | a1      |    50 |
|  1 | A      | a2      |    14 |
|  2 | B      | b1      |    20 |
|  3 | B      | b2      |    25 |
|  4 | B      | b3      |    15 |
|  5 | B      | b4      |    10 |
|  6 | C      | c1      |    32 |

PS: that's it, if it is still slow, I would advise looking at something that is parallelised!

1 Comment

Your code example was fast enough, so I won't need to look into the parallel process at least for now. Thanks
0

In your case try

s = df.pop('Specs').explode()
pd.DataFrame(s.tolist(),columns=['Model','Qty'],index=s.index).join(df)
Out[84]: 
  Model  Qty Type
0    a1   50    A
0    a2   14    A
1    b1   20    B
1    b2   25    B
1    b3   15    B
1    b4   10    B
2    c1   32    C

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.