1

I have a Data Frame like this(sample),

     A   B                           C   D            E
0   V1  B1                    Clearing  C1   1538884.46
1   V1  B1  CustomerPayment_Difference  C1  13537679.70
2   V1  B1                     Invoice  C1 -15771005.81
3   V1  B1           PaymentDifference  C1         0.00
4   V2  B2                    Clearing  C2    104457.22
5   V2  B2                     Invoice  C2   -400073.56
6   V2  B2                     Payment  C2    297856.45
7   V3  B3                    Clearing  C3   1989462.95
8   V3  B3                  CreditMemo  C3       538.95
9   V3  B3  CustomerPayment_Difference  C3   2112329.00
10  V3  B3                     Invoice  C3  -4066485.69
11  V4  B4                    Clearing  C4   -123946.13
12  V4  B4                  CreditMemo  C4    127624.66
13  V4  B4                  Accounting  C4    424774.52
14  V4  B4                     Invoice  C4 -40446521.41
15  V4  B4                     Payment  C4  44441419.95

I want to reshape this data frame like below:

   A  B  D    Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
  V1  B1 C1          NaN  1538884.46         NaN                  13537679.7   
  V2  B2 C2          NaN   104457.22         NaN                         NaN   
  V3  B3 C3          NaN  1989462.95      538.95                   2112329.0   
  V4  B4 C4    424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN 

So far I tried to get help from pivot table, df.pivot(index='A',columns='C', values='E').reset_index()

It gives result like below:

C   A  Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
0  V1         NaN  1538884.46         NaN                  13537679.7   
1  V2         NaN   104457.22         NaN                         NaN   
2  V3         NaN  1989462.95      538.95                   2112329.0   
3  V4   424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN

In above table it leave B&C columns, I need that columns as well.

This have provided this sample data for simplicity. But in future data will be like this also,

     A   B                           C   D            E
0   V1  B1                    Clearing  C1   1538884.46
1   V1  B1  CustomerPayment_Difference  C1  13537679.70
2   V1  B1                     Invoice  C1 -15771005.81
3   V1  B1           PaymentDifference  C1         0.00
**4   V1  B2                    Clearing  C1         88.9
5   V1  B2                    Clearing  C2         79.9**

In this situation my code will throw duplicate index error.

To fix this two problems I need to specify A,B,D as index. I need a code similar to this,

df.pivot(index=['A','B','D'],columns='C', values='E').reset_index()

this code throw me an error.

How to solve this? How to provide Multiple columns as index in pandas pivot table?

1 Answer 1

1

I think need:

df = df.set_index(['A','B','D', 'C'])['E'].unstack().reset_index()
print (df)
C   A   B   D  Accounting    Clearing  CreditMemo  CustomerPayment_Difference  \
0  V1  B1  C1         NaN  1538884.46         NaN                  13537679.7   
1  V2  B2  C2         NaN   104457.22         NaN                         NaN   
2  V3  B3  C3         NaN  1989462.95      538.95                   2112329.0   
3  V4  B4  C4   424774.52  -123946.13   127624.66                         NaN   

C      Invoice      Payment  PaymentDifference  
0 -15771005.81          NaN                0.0  
1   -400073.56    297856.45                NaN  
2  -4066485.69          NaN                NaN  
3 -40446521.41  44441419.95                NaN  

Another solution is use pivot_table:

df = df.pivot_table(index=['A','B','D'], columns='C', values='E')

But it aggregate if duplicates in A, B, C, D columns. In first solution get error if duplicates:

print (df)
    A   B                           C   D            E
0  V1  B1                    Clearing  C1      3000.00 <-V1,B1,Clearing,C1
1  V1  B1  CustomerPayment_Difference  C1  13537679.70
2  V1  B1                     Invoice  C1 -15771005.81
3  V1  B1           PaymentDifference  C1         0.00
4  V1  B1                   Cleari7ng  C1      1000.00 <-V1,B1,Clearing,C1


df = df.set_index(['A','B','D', 'C'])['E'].unstack().reset_index()
print (df)

ValueError: Index contains duplicate entries, cannot reshape

But pivot_table aggregate:

df = df.pivot_table(index=['A','B','D'], columns='C', values='E')
print (df)

C         Clearing  CustomerPayment_Difference      Invoice  PaymentDifference
A  B  D                                                                       
V1 B1 C1    2000.0                  13537679.7 -15771005.81                0.0

So question is: Is good idea always use pivot_table?

In my opinion it depends if need care about duplicates - if use pivot or set_index + unstack get error - you know about dupes, but pivot_table always aggregate, so no idea about dupes.

Sign up to request clarification or add additional context in comments.

4 Comments

It works fine :) but i'm wondering how to solve this by pivot table? can't I keep multiple column as index in pivot?
@MohamedThasinah - There is possible use pivot_table, but it aggregate if duplicates. df = df.pivot_table(index=['A','B','D'], columns='C', values='E')
Duplicate of 'A','B','D' columns?
@MohamedThasinah - Check sample, just added to answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.