Reshaping pandas dataframe using pivot and provide multiple column as index

Question

I have a Data Frame like this(sample),

     A   B                           C   D            E
0   V1  B1                    Clearing  C1   1538884.46
1   V1  B1  CustomerPayment_Difference  C1  13537679.70
2   V1  B1                     Invoice  C1 -15771005.81
3   V1  B1           PaymentDifference  C1         0.00
4   V2  B2                    Clearing  C2    104457.22
5   V2  B2                     Invoice  C2   -400073.56
6   V2  B2                     Payment  C2    297856.45
7   V3  B3                    Clearing  C3   1989462.95
8   V3  B3                  CreditMemo  C3       538.95
9   V3  B3  CustomerPayment_Difference  C3   2112329.00
10  V3  B3                     Invoice  C3  -4066485.69
11  V4  B4                    Clearing  C4   -123946.13
12  V4  B4                  CreditMemo  C4    127624.66
13  V4  B4                  Accounting  C4    424774.52
14  V4  B4                     Invoice  C4 -40446521.41
15  V4  B4                     Payment  C4  44441419.95

I want to reshape this data frame like below:

   A  B  D    Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
  V1  B1 C1          NaN  1538884.46         NaN                  13537679.7   
  V2  B2 C2          NaN   104457.22         NaN                         NaN   
  V3  B3 C3          NaN  1989462.95      538.95                   2112329.0   
  V4  B4 C4    424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN

So far I tried to get help from pivot table, df.pivot(index='A',columns='C', values='E').reset_index()

It gives result like below:

C   A  Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
0  V1         NaN  1538884.46         NaN                  13537679.7   
1  V2         NaN   104457.22         NaN                         NaN   
2  V3         NaN  1989462.95      538.95                   2112329.0   
3  V4   424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN

In above table it leave B&C columns, I need that columns as well.

This have provided this sample data for simplicity. But in future data will be like this also,

     A   B                           C   D            E
0   V1  B1                    Clearing  C1   1538884.46
1   V1  B1  CustomerPayment_Difference  C1  13537679.70
2   V1  B1                     Invoice  C1 -15771005.81
3   V1  B1           PaymentDifference  C1         0.00
**4   V1  B2                    Clearing  C1         88.9
5   V1  B2                    Clearing  C2         79.9**

In this situation my code will throw duplicate index error.

To fix this two problems I need to specify A,B,D as index. I need a code similar to this,

df.pivot(index=['A','B','D'],columns='C', values='E').reset_index()

this code throw me an error.

How to solve this? How to provide Multiple columns as index in pandas pivot table?

jezrael · Accepted Answer · 2018-03-16 07:02:58Z

1

I think need:

df = df.set_index(['A','B','D', 'C'])['E'].unstack().reset_index()
print (df)
C   A   B   D  Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
0  V1  B1  C1         NaN  1538884.46         NaN                  13537679.7   
1  V2  B2  C2         NaN   104457.22         NaN                         NaN   
2  V3  B3  C3         NaN  1989462.95      538.95                   2112329.0   
3  V4  B4  C4   424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN

Another solution is use pivot_table:

df = df.pivot_table(index=['A','B','D'], columns='C', values='E')

But it aggregate if duplicates in A, B, C, D columns. In first solution get error if duplicates:

print (df)
    A   B                           C   D            E
0  V1  B1                    Clearing  C1      3000.00 <-V1,B1,Clearing,C1
1  V1  B1  CustomerPayment_Difference  C1  13537679.70
2  V1  B1                     Invoice  C1 -15771005.81
3  V1  B1           PaymentDifference  C1         0.00
4  V1  B1                   Cleari7ng  C1      1000.00 <-V1,B1,Clearing,C1


df = df.set_index(['A','B','D', 'C'])['E'].unstack().reset_index()
print (df)

ValueError: Index contains duplicate entries, cannot reshape

But pivot_table aggregate:

df = df.pivot_table(index=['A','B','D'], columns='C', values='E')
print (df)

C         Clearing  CustomerPayment_Difference      Invoice  PaymentDifference
A  B  D                                                                       
V1 B1 C1    2000.0                  13537679.7 -15771005.81                0.0

So question is: Is good idea always use pivot_table?

In my opinion it depends if need care about duplicates - if use pivot or set_index + unstack get error - you know about dupes, but pivot_table always aggregate, so no idea about dupes.

edited Mar 16, 2018 at 7:02

answered Mar 16, 2018 at 6:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mohamed Thasin ah Over a year ago

It works fine :) but i'm wondering how to solve this by pivot table? can't I keep multiple column as index in pivot?

jezrael Over a year ago

@MohamedThasinah - There is possible use pivot_table, but it aggregate if duplicates. df = df.pivot_table(index=['A','B','D'], columns='C', values='E')

Mohamed Thasin ah Over a year ago

Duplicate of 'A','B','D' columns?

jezrael Over a year ago

@MohamedThasinah - Check sample, just added to answer.

Collectives™ on Stack Overflow

Reshaping pandas dataframe using pivot and provide multiple column as index

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related