2

I am trying to transform this DataFrame:

>>> df
D1     D2  
S1 S2  S1 S2
0  -4  1  -6
1  -5  2  -7
3  -8  3  -9

to something like:

>>> df
    0    1    2    3
D1  -4   -5   NaN  -8
D2  NaN  -6   -7   -9

Essentially the values in columns (D1,S1) and (D2,S1) need to be merged into column names and the values in columns (D1,S2) and (D2,S2) become the only data values.

How could I achieve that simply and efficiently ?

1 Answer 1

3

The desired DataFrame have the Ds in the index. stack moves column level values into the index. So it is natural to think about calling stack:

In [43]: result = df.stack(level=0); result
Out[43]: 
      S1  S2
0 D1   0  -4
  D2   1  -6
1 D1   1  -5
  D2   2  -7
2 D1   3  -8
  D2   3  -9

The desired DataFrame only has Ds in the index, so let's drop the outer level values:

In [44]: result.index = result.index.droplevel(0); result
Out[44]: 
    S1  S2
D1   0  -4
D2   1  -6
D1   1  -5
D2   2  -7
D1   3  -8
D2   3  -9

We now want the S1 column to become a column level value. That can be done by first moving the S1 column into the index,

In [45]: result = result.set_index('S1', append=True); result
Out[45]: 
       S2
   S1    
D1 0   -4
D2 1   -6
D1 1   -5
D2 2   -7
D1 3   -8
D2 3   -9

and then calling unstack to move the index level to a column level:

In [46]: result = result.unstack('S1'); result
Out[46]: 
    S2          
S1   0  1   2  3
D1  -4 -5 NaN -8
D2 NaN -6  -7 -9

Putting it all together,

import pandas as pd
df = pd.DataFrame({('D1', 'S1'): [0, 1, 3],
 ('D1', 'S2'): [-4, -5, -8],
 ('D2', 'S1'): [1, 2, 3],
 ('D2', 'S2'): [-6, -7, -9]})

result = df.stack(level=0)
result.index = result.index.droplevel(0)
result = result.set_index('S1', append=True)
result = result.unstack('S1')
result.columns = result.columns.droplevel(0)
print(result)

yields

S1   0  1   2  3
D1  -4 -5 NaN -8
D2 NaN -6  -7 -9

For reference: The four fundamental DataFrame reshaping operations are

They are fundamental because a DataFrame is just an array with index and column levels. These operators allow you to move values from the array to index levels to column levels and back. Combining of these 4 operations in various ways can solve a lot of DataFrame reshaping problems.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.