0

I have a df like below, I want to tranform values in to columns (pivot) operation. I am unable to perform because my data is in list.

My sample input has two columns (scores and classes). These columns are ordered values. i.e., class 19's score is 0.97 and 0.77 for class 0. I want to transform my df such that classes values will be column names and it's corresponding scores will be in respective column.

Sample Input:

    file_name        scores                         classes  
0  voc_32.jpg  [0.97, 0.77]                     [19.0, 0.0]   
1  voc_22.jpg  [0.92, 0.64, 0.83, 0.55]         [17.0, 1.0, 11.0, 11.0]   

Expected output:

    file_name  0      1     11           17     19
0  voc_32.jpg  0.77                            0.97 
1  voc_22.jpg         0.64  [0.83, 0.55]       0.92

Any help would be appreciable.

1 Answer 1

3

Create list of dictionaries in list comprehension and pass to DataFrame constructor, last add to original by DataFrame.join:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.scores, df.classes)], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df[['file_name']].join(df1)

Similar solution with drop columns by DataFrame.pop:

df1 = (pd.DataFrame([dict(zip(b, a)) for a, b in zip(df.pop('scores'), df.pop('classes'))], 
                     index=df.index).sort_index(axis=1).rename(columns=int))
df2 = df.join(df1)
print (df2)
    file_name     0     1    11    17    19
0  voc_32.jpg  0.77   NaN   NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  0.83  0.92   NaN

EDIT: For list if multiple classes use Series.explode for flattem, then aggregate custom function in GroupBy.agg with reshape by Series.unstack:

f = lambda x: list(x) if len(x) > 1 else x
df1 = (df.apply(pd.Series.explode)
         .groupby(['file_name','classes'])['scores']
         .agg(f)
         .unstack()
         .rename(columns=int))
print (df1)
classes       0     1             11    17    19
file_name                                       
voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN
voc_32.jpg  0.77   NaN           NaN   NaN  0.97

df2 = df[['file_name']].join(df1, on='file_name')
print (df2)

    file_name     0     1            11    17    19
0  voc_32.jpg  0.77   NaN           NaN   NaN  0.97
1  voc_22.jpg   NaN  0.64  [0.83, 0.85]  0.92   NaN
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer. i forgot to mention that classes can be found more than one in a cell. Please look at my edited post. The suggested solution works well for my old post. for the edited post not.
@MohamedThasinah - Added to answer.
@MohamedThasinah - Super, glad can help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.