Pandas dataframe reshape row values into new columns (matrix type format)

Question

I'm new to pandas and was looking for some advice on how to reshape my dataframe:

Currently, I have a dataframe like this.

panelist_id	type	type_count	refer_sm_count	refer_se_count	refer_non_n_count
1	HP	2	2	1	1
1	PB	1	0	1	0
1	TN	3	0	3	0
2	HP	1	1	0	0
2	PB	2	1	1	0	0

Ideally, I want my dataframe to look like this:

panelist_id	type_HP_count	type_PB_count	type_TN_count	refer_sm_count_HP	refer_se_count_HP	refer_non_n_count_HP	refer_sm_count_PB	refer_se_count_PB	refer_non_n_count_PB	refer_sm_count_TN	refer_se_count_TN	refer_non_n_count_TN
1	2	1	3	2	1	0	0	1	0	0	0	0
2	1	2	0	1	0	0	1	1	0	0	0	0

Basically, I need to convert the different row values in the 'type' column into new columns, showing the count for each type. The next three columns on the original df titled 'refer' need to account for each different 'type'. e.g., refers_sm_count_[from type X (e.g., HP)]. Any help would be much appreciated. Thanks

Anurag Dabas · Accepted Answer · 2021-06-01 16:59:42Z

3

Try via pivot_table() and rename_axis() method:

out=(df.pivot_table(index='panelist_id',columns='type',fill_value=0)
      .rename_axis(columns=[None,None],index=None))

Finally use map() method and .columns attribute:

out.columns=out.columns.map('_'.join)

Now If you print out you will get your desired output

answered Jun 1, 2021 at 16:59

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

metassi Over a year ago

Thanks Anurag - I get an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects. Any ideas as how to fix?

Anurag Dabas Over a year ago

pivot_table can handle this type of problems....what is your pandas version?

Henry Ecker · Accepted Answer · 2021-06-01 17:11:17Z

A pivot_wider option via pyjanitor:

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

new_df:

panelist_id  type_count_HP  type_count_PB  type_count_TN  refer_sm_count_HP  refer_sm_count_PB  refer_sm_count_TN  refer_se_count_HP  refer_se_count_PB  refer_se_count_TN  refer_non_n_count_HP  refer_non_n_count_PB  refer_non_n_count_TN
          1              2              1              3                  2                  0                  0                  1                  1                  3                     1                     0                     0
          2              1              2              0                  1                  1                  0                  0                  1                  0                     0                     0                     0

Complete Working Example:

import janitor
import pandas as pd

df = pd.DataFrame({
    'panelist_id': [1, 1, 1, 2, 2],
    'type': ['HP', 'PB', 'TN', 'HP', 'PB'],
    'type_count': [2, 1, 3, 1, 2],
    'refer_sm_count': [2, 0, 0, 1, 1],
    'refer_se_count': [1, 1, 3, 0, 1],
    'refer_non_n_count': [1, 0, 0, 0, 0]
})

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

print(new_df.to_string(index=False))

Nk03 · Accepted Answer · 2021-06-01 20:18:55Z

3

Just adding one more option:

df = df.set_index(['panelist_id', 'type']).unstack(-1, ,fill_value=0)
df.columns = df.columns.map('_'.join)

edited Jun 1, 2021 at 20:18

answered Jun 1, 2021 at 17:44

Nk03

15k2 gold badges11 silver badges24 bronze badges

Comments

It_is_Chris · Accepted Answer · 2021-06-01 16:58:12Z

2

use pivot_table to create a multi-index

df_p = df.pivot_table(index='panelist_id', columns='type', aggfunc=sum)

            refer_non_n_count           refer_se_count            \
type                       HP   PB   TN             HP   PB   TN   
panelist_id                                                        
1                         1.0  0.0  0.0            1.0  1.0  3.0   
2                         0.0  0.0  NaN            0.0  1.0  NaN   

            refer_sm_count           type_count            
type                    HP   PB   TN         HP   PB   TN  
panelist_id                                                
1                      2.0  0.0  0.0        2.0  1.0  3.0  
2                      1.0  1.0  NaN        1.0  2.0  NaN

if you do want to flatten your columns then

df_p.columns = ['_'.join(col) for col in df_p.columns.values]

answered Jun 1, 2021 at 16:58

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

1 Comment

metassi Over a year ago

Thanks for your help. This returns me an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Caíque Filipini · Accepted Answer · 2021-06-01 17:30:57Z

First, import libs:

import numpy as np
import pandas as pd

Then, read your data:

data = pd.read_excel('base.xlsx')

Reshape your data using pivot_table:

data_reshaped = pd.pivot_table(data, values=['type_count', 'refer_sm_count', 'refer_se_count', 'refer_non_n_count'],
                               index=['panelist_id'], columns=['type'], aggfunc=np.sum)

But, your index will not be good. So, reset then:

columns = [data_reshaped.columns[i][0] + '_' + data_reshaped.columns[i][1]
           for i in range(len(data_reshaped.columns))] # to create new columns names

data_reshaped.columns = columns # to assign new columns names to dataframe
data_reshaped.reset_index(inplace=True) # to reset index
data_reshaped.fillna(0, inplace=True) # to substitute nan to 0

Then, your data will be like good

Collectives™ on Stack Overflow

Pandas dataframe reshape row values into new columns (matrix type format)

5 Answers 5

2 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related