2

I have two dataframes where I am trying to replace substring on level1 of multi index with another substring but this fails

For example I have a dataframe df

Index0   Index1    0     1     2
A        BX       .2    .3    .9      
         CX       .34   .55   .54           

D        EX       .34   .44   .32
         FX       .43.  .88.  .06

I am trying to replace the Index1 substring X by Y so that my result
looks like as follows

Index0   Index1    0     1     2
A        BY       .2    .3    .9      
         CY       .34   .55   .54           

D        EY       .34   .44   .32
         FY       .43.  .88.  .06

I am using the following function

df.replace('X','Y')

however i get the following error

AttributeError                   Traceback (most recent   call last)
<ipython-input-56-fc7014a2d950> in <module>()
  8 
  9 
---> 10 df.replace('X','Y')

AttributeError: 'MultiIndex' object has no attribute 'replace'
4
  • please add your code for creating df. Really, df looks like an Index Commented Sep 7, 2017 at 20:16
  • df is a dataframe. and Index0 and Index1 are the Index of df dataframe. Commented Sep 7, 2017 at 20:17
  • AttributeError Traceback (most recent call last) <ipython-input-56-fc7014a2d950> in <module>() 8 9 ---> 10 df.replace('X','Y') AttributeError: 'MultiIndex' object has no attribute 'replace' Commented Sep 7, 2017 at 20:19
  • Yes, and I want to modify the Multiindex substring with another substring as highlighted above but I am unable to do it and hence my question. Commented Sep 7, 2017 at 20:21

3 Answers 3

3

@cᴏʟᴅsᴘᴇᴇᴅ improved on my answer so I will leave just a slower alternate here...

import numpy as np
df = pd.DataFrame(np.random.randn(4,3), 
                  index=[list('aabb'), [n + 'X' for n in list('abcd')]])

Here's an alternate method using reset_index. This would be applicable if you wanted to replace in more than one column. The trick is that you can't use replace on the Index so you have to "bring it into" the DataFrame.

new = (df.reset_index()
           .select_dtypes(include=['object'])
           .apply(lambda col: col.str.replace('X', 'Y')))

df.index = pd.MultiIndex.from_tuples(new.values.tolist())
Sign up to request clarification or add additional context in comments.

3 Comments

Good approach - I was playing with one that used reset_index - pushing the index into editable dataframe columns - then using set_index to push those values back into the index. This is less invasive; more elegant.
I don't think using a series just to do a simple replacement is the best way to do this.
@Coldspeed no but isn't that catchy??
3

Or Try this

df.index=pd.MultiIndex.from_tuples([(x[0], x[1].replace('X', 'Y')) for x in df.index])
df
Out[304]: 
             0         1         2
a aY -0.696181 -1.929523 -1.903956
  bY  0.071061 -0.594185 -2.005251
b cY -0.097761  0.093667  1.780550
  dY  0.127887  1.534395  0.352351

2 Comments

There is a loop here. But still, better.
The only solution I could understand ;-)
2

You're doing more than you need to.

df 
                  0     1     2
Index0 Index1                  
A      BX        .2    .3  0.90
       CX       .34   .55  0.54
D      EX       .34   .44  0.32
       FX      .43.  .88.  0.06

Use pd.MultiIndex.from_arrays and you can do this in one step.

df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
                                       df.index.levels[1].str.replace('X', 'Y')])

df
                  0     1     2
Index0 Index1                  
A      BY        .2    .3  0.90
       CY       .34   .55  0.54
D      EY       .34   .44  0.32
       FY      .43.  .88.  0.06

Performance

%%timeit
new = (df.reset_index()
            .select_dtypes(include=['object'])
            .apply(lambda col: col.str.replace('X', 'Y')))

df.index = pd.MultiIndex.from_tuples(new.values.tolist())

10 loops, best of 3: 93.5 ms per loop

Nearly 100ms for a tiny dataframe. Contrast with:

%%timeit
df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
                                        df.index.levels[1].str.replace('X', 'Y')])

1000 loops, best of 3: 934 µs per loop

1 Comment

You are right, thought str.replace could not operate on Index. This should be the accepted answer then @user2560244

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.