2

I'm new to pandas and was looking for some advice on how to reshape my dataframe:

Currently, I have a dataframe like this.

panelist_id type type_count refer_sm_count refer_se_count refer_non_n_count
1 HP 2 2 1 1
1 PB 1 0 1 0
1 TN 3 0 3 0
2 HP 1 1 0 0
2 PB 2 1 1 0 0

Ideally, I want my dataframe to look like this:

panelist_id type_HP_count type_PB_count type_TN_count refer_sm_count_HP refer_se_count_HP refer_non_n_count_HP refer_sm_count_PB refer_se_count_PB refer_non_n_count_PB refer_sm_count_TN refer_se_count_TN refer_non_n_count_TN
1 2 1 3 2 1 0 0 1 0 0 0 0
2 1 2 0 1 0 0 1 1 0 0 0 0

Basically, I need to convert the different row values in the 'type' column into new columns, showing the count for each type. The next three columns on the original df titled 'refer' need to account for each different 'type'. e.g., refers_sm_count_[from type X (e.g., HP)]. Any help would be much appreciated. Thanks

5 Answers 5

3

Try via pivot_table() and rename_axis() method:

out=(df.pivot_table(index='panelist_id',columns='type',fill_value=0)
      .rename_axis(columns=[None,None],index=None))

Finally use map() method and .columns attribute:

out.columns=out.columns.map('_'.join)

Now If you print out you will get your desired output

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Anurag - I get an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects. Any ideas as how to fix?
pivot_table can handle this type of problems....what is your pandas version?
3

A pivot_wider option via pyjanitor:

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

new_df:

panelist_id  type_count_HP  type_count_PB  type_count_TN  refer_sm_count_HP  refer_sm_count_PB  refer_sm_count_TN  refer_se_count_HP  refer_se_count_PB  refer_se_count_TN  refer_non_n_count_HP  refer_non_n_count_PB  refer_non_n_count_TN
          1              2              1              3                  2                  0                  0                  1                  1                  3                     1                     0                     0
          2              1              2              0                  1                  1                  0                  0                  1                  0                     0                     0                     0

Complete Working Example:

import janitor
import pandas as pd

df = pd.DataFrame({
    'panelist_id': [1, 1, 1, 2, 2],
    'type': ['HP', 'PB', 'TN', 'HP', 'PB'],
    'type_count': [2, 1, 3, 1, 2],
    'refer_sm_count': [2, 0, 0, 1, 1],
    'refer_se_count': [1, 1, 3, 0, 1],
    'refer_non_n_count': [1, 0, 0, 0, 0]
})

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

print(new_df.to_string(index=False))

Comments

3

Just adding one more option:

df = df.set_index(['panelist_id', 'type']).unstack(-1, ,fill_value=0)
df.columns = df.columns.map('_'.join)

Comments

2

use pivot_table to create a multi-index

df_p = df.pivot_table(index='panelist_id', columns='type', aggfunc=sum)

            refer_non_n_count           refer_se_count            \
type                       HP   PB   TN             HP   PB   TN   
panelist_id                                                        
1                         1.0  0.0  0.0            1.0  1.0  3.0   
2                         0.0  0.0  NaN            0.0  1.0  NaN   

            refer_sm_count           type_count            
type                    HP   PB   TN         HP   PB   TN  
panelist_id                                                
1                      2.0  0.0  0.0        2.0  1.0  3.0  
2                      1.0  1.0  NaN        1.0  2.0  NaN 

if you do want to flatten your columns then

df_p.columns = ['_'.join(col) for col in df_p.columns.values]

1 Comment

Thanks for your help. This returns me an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects
2

First, import libs:

import numpy as np
import pandas as pd

Then, read your data:

data = pd.read_excel('base.xlsx')

Reshape your data using pivot_table:

data_reshaped = pd.pivot_table(data, values=['type_count', 'refer_sm_count', 'refer_se_count', 'refer_non_n_count'],
                               index=['panelist_id'], columns=['type'], aggfunc=np.sum)

But, your index will not be good. So, reset then:

columns = [data_reshaped.columns[i][0] + '_' + data_reshaped.columns[i][1]
           for i in range(len(data_reshaped.columns))] # to create new columns names

data_reshaped.columns = columns # to assign new columns names to dataframe
data_reshaped.reset_index(inplace=True) # to reset index
data_reshaped.fillna(0, inplace=True) # to substitute nan to 0

Then, your data will be like good

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.