13

I have tried many times, but seems the 'replace' can NOT work well after use 'loc'. For example I want to replace the 'conlumn_b' with an regex for the row that the 'conlumn_a' value is 'apple'.

Here is my sample code :

df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'].replace(r'^11*', 'XXX',inplace=True, regex=True)

Example:

conlumn_a       conlumn_b
apple           123
banana          11
apple           11
orange          33

The result that I expected for the 'df' is:

conlumn_a       conlumn_b
apple           123
banana          11
apple           XXX
orange          33

Anyone has meet this issue that needs 'replace' with regex after 'loc' ?

OR you guys has some other good solutions ?

Thank you so much for your help!

1
  • This question could be improved by providing code to produce the df. That code should make the numbers str rather than int types. Commented Jul 12, 2022 at 18:31

3 Answers 3

14

inplace=True works on the object that it was applied on.

When you call .loc, you're slicing your dataframe object to return a new one.

>>> id(df)
4587248608

And,

>>> id(df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'])
4767716968

Now, calling an in-place replace on this new slice will apply the replace operation, updating the new slice itself, and not the original.


Now, note that you're calling replace on a column of int, and nothing is going to happen, because regular expressions work on strings.

Here's what I offer you as a workaround. Don't use regex at all.

m = df['conlumn_a'] == 'apple'
df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b'].replace(11, 'XXX')

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Or, if you need regex based substitution, then -

df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b']\
           .astype(str).replace('^11$', 'XXX', regex=True)

Although, this converts your column to an object column.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks and I will take you suggestion. The situation is that I need regex in the 'replace', the '11 to XXX' is only a sample and the actually regex can be more complex.@cᴏʟᴅsᴘᴇᴇᴅ
Thank you for explaining the cause of the problem: I wasn't aware loc was creating a new object. Always good to understand the underlying cause.
I'm still confused. Isn't loc supposed to provide an assignable view? Why does replace work after loc when doing ordinary replacement, but not when doing regex replacement?
6

I'm going to borrow from a recent answer of mine. This technique is a general purpose strategy for updating a dataframe in place:

df.update(
    df.loc[df['conlumn_a'] == 'apple', 'conlumn_b']
      .replace(r'^11$', 'XXX', regex=True)
)

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Note that all I did was remove the inplace=True and instead wrapped it in the pd.DataFrame.update method.

3 Comments

@piRSquared - Why is update better? It is faster? Or nicer code only?
Nicer code. Using it for its intended purpose. I show how it should be used to accomplish what people meant to accomplish with inplace=True but couldn't because they were performing it on a subsequent object.
This is actually nicer code, but it's completely not performant vs the .loc alternative sadly. Just tried it on 2M rows, it doesn't run in less than one minute, and had to stop it. Meanwhile the "dirty" .loc alternative runs instantly
3

I think you need filter in both sides:

m = df['conlumn_a'] == 'apple'
df.loc[m,'conlumn_b'] = df.loc[m,'conlumn_b'].astype(str).replace(r'^(11+)','XXX',regex=True)
print (df)
  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

4 Comments

But I got the same concern as you about the 'update' : it's faster or not. I think I am going to have a test about it.
@JonathanZhou - The best test it, but in my opinion it is slowier.
@jezrael do you know why the df[df[col1=='aa']][col2] = df[df[col1=='aa']][col2].replace(map_dict) won't work?
@mingchau - problem is chained indexing, check this - it is repalce only filtered Series, not DataFrame slice

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.