how to transform dataframe so that column values are row values

Question

I have the following dataframe, which looks like the below:

df = pd.DataFrame({'fruit': ['berries','berries', 'berries', 'tropical', 
'tropical','tropical','berries','nuts'], 
           'code': [100,100,100,200,200, 300,400,500],
           'subcode': ['100A', '100B', '100C','200A', '200B','300A', 
           '400A', '500A']})


    code    fruit   subcode
  0 100     berries 100A
  1 100     berries 100B
  2 100     berries 100C
  3 200     tropica 200A
  4 200     tropical 200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A

I want to transform the dataframe to this format:

    code    fruit   subcode1 subcode1 subcode1
  0 100     berries 100A      100B   100C
  3 200     tropica 200A      200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A

Unfortunately, I'm stuck as to how to proceed. I've consulted posts like, Unmelt Pandas DataFrame, and have combinations of stack and unstack. I suspect that some concatenation is involved, too. Would appreciate any advice to help point me in the right direction!

Bharath M Shetty · Accepted Answer · 2018-06-22 17:09:38Z

4

You can use groupby, take the values and convert them to series.

df.groupby(['code','fruit'])['subcode'].apply(
         lambda x: x.values
      ).apply(pd.Series)
       .add_prefix('subcode_')

                subcode_0 subcode_1 subcode_2
code fruit                                 
100  berries       100A      100B      100C
200  tropical      200A      200B       NaN
300  tropical      300A       NaN       NaN
400  berries       400A       NaN       NaN
500  nuts          500A       NaN       NaN

edited Jun 22, 2018 at 17:09

answered Jun 22, 2018 at 16:58

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cs95 Over a year ago

I like this approach, but I dislike the apply(Series). Good effort though!

Bharath M Shetty Over a year ago

I doo agreee, consumes a lotta time.

ALollz Over a year ago

Is there any difference between applying ravel versus list?

Bharath M Shetty Over a year ago

@ALollz I just realize that's unnecessary .

imstuck Over a year ago

thanks so much! this totally works on my data set and i learned that there is an .add_prefix/sufix to row/col labels.

cs95 · Accepted Answer · 2018-06-22 16:58:52Z

4

Play around a bit with set_index and unstack, and you'll get it.

(df.set_index(['code', 'fruit'])
   .set_index(df.subcode.str.extract('([a-zA-Z]+)', expand=False), append=True)
   .subcode
   .unstack()
   .fillna('')                  # these last three 
   .reset_index()               # operations are  
   .rename_axis(None, axis=1)   # not important
)

   code     fruit     A     B     C
0   100   berries  100A  100B  100C
1   200  tropical  200A  200B      
2   300  tropical  300A            
3   400   berries  400A            
4   500      nuts  500A

answered Jun 22, 2018 at 16:58

cs95

406k106 gold badges745 silver badges798 bronze badges

Comments

piRSquared · Accepted Answer · 2018-06-22 17:19:54Z

3

With defaultdict

from collections import defaultdict


d = defaultdict(list)

for f, c, s in df.itertuples(index=False):
    d[(f, c)].append(s)

pd.DataFrame.from_dict(
    {k: dict(enumerate(v)) for k, v in d.items()}, orient='index'
).add_prefix('subcode').rename_axis(['fruit', 'code']).reset_index()

      fruit  code subcode0 subcode1 subcode2
0   berries   100     100A     100B     100C
1   berries   400     400A      NaN      NaN
2      nuts   500     500A      NaN      NaN
3  tropical   200     200A     200B      NaN
4  tropical   300     300A      NaN      NaN

answered Jun 22, 2018 at 17:19

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

imstuck Over a year ago

thanks! I'll will have to read up on default dict to see how it works. definitely appreciate learning different approaches.

Collectives™ on Stack Overflow

how to transform dataframe so that column values are row values

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related