1

Consider the following DataFrame:

import numpy as np
import pandas as pd

arrays1 = [
    ["A", "A", "A", "B", "B", "B"],
    ["foo", "bar", "baz", "foo", "bar", "baz"],
]
tuples1 = list(zip(*arrays1))
index_values1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(np.ones((6, 6)), index=index_values1, columns=index_values1)

print(df1)

         A              B          
       foo  bar  baz  foo  bar  baz
A foo  1.0  1.0  1.0  1.0  1.0  1.0
  bar  1.0  1.0  1.0  1.0  1.0  1.0
  baz  1.0  1.0  1.0  1.0  1.0  1.0
B foo  1.0  1.0  1.0  1.0  1.0  1.0
  bar  1.0  1.0  1.0  1.0  1.0  1.0
  baz  1.0  1.0  1.0  1.0  1.0  1.0
  

Say I want to replace the deepest level of either indices, columns, or both; as well as add a level, according to the following mapping:

d_idx = {
    'foo': ('qux', 'one'),
    'bar': ('quux', 'two'),
    'baz': ('corge', 'three'),
    }
    

Such that I get either:

                 A              B          
               foo  bar  baz  foo  bar  baz
A qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
B qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
  

or

         A               B           
       qux quux corge  qux quux corge
       one  two three  one  two three
A foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
B foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
  

or

                 A               B           
               qux quux corge  qux quux corge
               one  two three  one  two three
A qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0
B qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0

I have tried a number of ways, which seem to work but don't seem elegant to me. In particular, I want to make sure that this mapping is robust (e.g. no reordering due to unstacking indices etc.).

My first approach was by constructing a correspondance matrix:

     qux quux corge
     one  two three
foo  1.0  0.0   0.0
bar  0.0  1.0   0.0
baz  0.0  0.0   1.0

The advantage of this approach is that the mapping from one set of indices to the other is robust, by taking the dot product with this matrix and an unstacked df1. However by unstacking, the indices are implicitly sorted. This means I need to somehow retain the original order of the index before being able to take the dot product. While probably doable, this doesn't seem elegant to me.

df1_u = df1.unstack(level=0)

    A   A   A   A   A   A   B   B   B   B   B   B
    foo foo bar bar baz baz foo foo bar bar baz baz
    A   B   A   B   A   B   A   B   A   B   A   B
bar 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
foo 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Next, I tried changing the indices in a for-loop:

l_idx_old = list(df1.index)
l_idx_new = []

for t_idx_old in l_idx_old:
    idx0, idx1_old = t_idx_old
    idx1_new, idx2_new = d_idx[idx1_old]
    print(idx1_old, idx1_new, idx2_new)
    t_idx_new = (idx0, idx1_new, idx2_new)
    l_idx_new.append(t_idx_new)

df1.index = pd.MultiIndex.from_tuples(l_idx_new)

This works, however the last line is not robust, as there is no check whether the indices are assigned correctly.

In short, is there a robust and elegant way to carry out this intended mutation? Any help is much appreciated.

1 Answer 1

1

If mapping one level of MultiIndex get new MultiIndex, so possible append to existing by DataFrame.set_index with remove mapped level by DataFrame.droplevel:

d_idx = {
    'foo': ('qux', 'one'),
    'bar': ('quux', 'two'),
    'baz': ('corge', 'three'),
    }

df1 = df1.set_index(df1.index.get_level_values(1).map(d_idx), append=True).droplevel(1)
print (df1)
                 A              B          
               foo  bar  baz  foo  bar  baz
A qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
B qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0

For columns is it possible with helper DataFrame and same principe:

new = df1.columns.get_level_values(1).map(d_idx)

df1.columns = df1.columns.to_frame().set_index(new, append=True).droplevel(1).index

print (df1)

         A               B           
       qux quux corge  qux quux corge
       one  two three  one  two three
A foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
B foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
Sign up to request clarification or add additional context in comments.

6 Comments

Lightning fast as always, thanks @jezrael! If I understand correctly df1.index.get_level_values(1).map(d_idx) robustly maps the old to new index. Next, the new index is appended to the old index, after which the old index is dropped. How can this example be extended to the columns?
@BdB - Ya, not so easy, looking for it.
@BdB - yes, it is possible, I looking for solution without ttranspose
Thanks for the extension to columns! Now that you mention it, indeed I looked into casting indices to frame, but I got confused... Indeed it seems that the case for columns is less straightforward then the case for indices. Anyway, thanks a lot and happy holidays!
@BdB - It is more complicates, because set_index not working for MulitIndex in columns. Thank you, for you too.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.