1

My dataset has multiple columns which include lists.

   A             B                First_val   Second_value  Colour
0  aaa  ['hello', 'world']   ['bla','bla']   ['blue']       Red
1  sad  ['aaa']              ['cat']         ['hi']         Green
2  vas  ['min','max']        ['avg']         []             Yellow
...

The colour should be assigned only based on A value: this means that, if I have multiple rows because of exploding lists, I should have red all the times, then green, then yellow for all the rows having in A aaa, sad and vas respectively.

I have tried with

(df.set_index('A')
              .apply(lambda x: x.apply(pd.Series).stack())
              .reset_index()
              .drop('level_1', 1))

but the value of Colour is not assigned per all A's values, but only for the first one (the others are NaN). Example (wrong):

   A             B                First_val   Second_value  Colour
0  aaa  hello                     bla          blue       Red
1  aaa  world                     bla          nan        nan
...

Expected output:

   A             B                First_val   Second_value  Colour
0  aaa  hello                     bla          blue       Red
1  aaa  world                     bla          nan        Red
...

How can I get it?

1 Answer 1

3

I think you need add not lists column Colour to MultiIndex so not processing by converting lists to Series in DataFrame.set_index, also added DataFrame.reindex for original order of columns:

df = (df.set_index(['A','Colour'])
        .apply(lambda x: x.apply(pd.Series).stack())
        .reset_index()
        .reindex(df.columns, axis=1)
              )
print (df)
     A      B First_val Second_value  Colour
0  aaa  hello       bla         blue     Red
1  aaa  world       bla          NaN     Red
2  sad    aaa       cat           hi   Green
3  vas    min       avg          NaN  Yellow
4  vas    max       NaN          NaN  Yellow

Another I hope faster alternative with convert values of lists to DataFrames:

df = (df.set_index(['A','Colour'])
        .apply(lambda x: pd.DataFrame(x.tolist(), index=x.index).stack())
        .reset_index()
        .reindex(df.columns, axis=1)
              )
print (df)
     A      B First_val Second_value  Colour
0  aaa  hello       bla         blue     Red
1  aaa  world       bla          NaN     Red
2  sad    aaa       cat           hi   Green
3  vas    min       avg          NaN  Yellow
4  vas    max       NaN          NaN  Yellow
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.