How to transform list of values into pandas columns?

Question

I have a df like below, I want to tranform values in to columns (pivot) operation. I am unable to perform because my data is in list.

My sample input has two columns (scores and classes). These columns are ordered values. i.e., class 19's score is 0.97 and 0.77 for class 0. I want to transform my df such that classes values will be column names and it's corresponding scores will be in respective column.

Sample Input:

    file_name        scores                         classes  
0  voc_32.jpg  [0.97, 0.77]                     [19.0, 0.0]   
1  voc_22.jpg  [0.92, 0.64, 0.83, 0.55]         [17.0, 1.0, 11.0, 11.0]

Expected output:

    file_name  0      1     11           17     19
0  voc_32.jpg  0.77                            0.97 
1  voc_22.jpg         0.64  [0.83, 0.55]       0.92

Any help would be appreciable.

jezrael · Accepted Answer · 2021-05-12 07:34:35Z

3

Create list of dictionaries in list comprehension and pass to DataFrame constructor, last add to original by DataFrame.join:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.scores, df.classes)], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df[['file_name']].join(df1)

Similar solution with drop columns by DataFrame.pop:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.pop('scores'), df.pop('classes'))], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df.join(df1)
print (df2)
    file_name     0     1    11    17    19
0  voc_32.jpg  0.77   NaN   NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  0.83  0.92   NaN

EDIT: For list if multiple classes use Series.explode for flattem, then aggregate custom function in GroupBy.agg with reshape by Series.unstack:

f = lambda x: list(x) if len(x) > 1 else x
df1 = (df.apply(pd.Series.explode)
         .groupby(['file_name','classes'])['scores']
         .agg(f)
         .unstack()
         .rename(columns=int))
print (df1)
classes       0     1             11    17    19
file_name                                       
voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN
voc_32.jpg  0.77   NaN           NaN   NaN  0.97

df2 = df[['file_name']].join(df1, on='file_name')
print (df2)

    file_name     0     1            11    17    19
0  voc_32.jpg  0.77   NaN           NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN

edited May 12, 2021 at 7:34

answered May 12, 2021 at 7:06

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mohamed Thasin ah Over a year ago

Thanks for the answer. i forgot to mention that classes can be found more than one in a cell. Please look at my edited post. The suggested solution works well for my old post. for the edited post not.

jezrael Over a year ago

@MohamedThasinah - Added to answer.

jezrael Over a year ago

@MohamedThasinah - Super, glad can help!

Collectives™ on Stack Overflow

How to transform list of values into pandas columns?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related