2

There seems to be an issue with pandas replace() option when doing replacement on only a few columns:

# Example dataframe:
pd.DataFrame(data={"x":[1,2,3,4,5], "y":[2,4,1,2,4], "z":["no", "yes", "no", "no", "no"], "t":["a", "b", "c", "d", "d"]})

# Try to replace the 2s inplace:
a.loc[:, ["x", "y"]].replace(2,-9999, inplace=True)

a is still:

Out[32]: 
x  y    z  t
0  1  2   no  a
1  2  4  yes  b
2  3  1   no  c
3  4  2   no  d
4  5  4   no  d

Note that I do not get a settingWithCopy warning - also, I am using .loc as recommended. Since I use inplace=True, I would have expected the dataframe to change. Am I doing something wrong, or is this a bug to report on github?

I am using pandas version 0.23.0.

1
  • 1
    a.loc[:, ["x", "y"]] uses .loc.__getitem__. When using __getitem__, the returned object might be a copy. Here, a.loc[:, ["x", "y"]] returns a copy and that copied structured is in fact modified inplace but since you didn't assign it to anything you cannot see the change. The original df remains unchanged as well. Commented Jun 13, 2018 at 14:04

1 Answer 1

4

You cannnot use inplace=True, because subset returns a Series which may have its data as a view. Modifying it in place doesn't ALWAYS propogate it back to the parent object. That's why this warning is there (or raise if you set the option). You should never do this, nor is their ever a reason to do so.

a.loc[:, ["x", "y"]] = a.loc[:, ["x", "y"]].replace(2,-9999)
print (a)
      x     y    z  t
0     1 -9999   no  a
1 -9999     4  yes  b
2     3     1   no  c
3     4 -9999   no  d
4     5     4   no  d

Another solution is update, be default working inplace:

a.update(a.loc[:, ["x", "y"]].replace(2,-9999))
print (a)
      x     y    z  t
0     1 -9999   no  a
1 -9999     4  yes  b
2     3     1   no  c
3     4 -9999   no  d
4     5     4   no  d
Sign up to request clarification or add additional context in comments.

7 Comments

That was fast, and worked, thanks. Unfortunately, this means I'll have to type my indexing twice (right and left side), there is no way around this? My real indexing is unfortunately not as short as [:, ["x", "y"]] :-\
@Thomas Or you can save the indexing to a variable: idx = pd.IndexSlice[:, ['x', 'y']] and use it in loc: a.loc[idx] = ...
@jezrael: Great idea! I would not have thought of using update in this context, I only ever used it to replace NA values
@user2285236: I didn't even know about IndexSlice. This will be very helpful in the future. Really glad I asked this question.
@user2285236 - just test it idx = pd.IndexSlice[[1,2], ['x', 'y']] a.loc[idx] = 4
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.