Pandas how can 'replace' work after 'loc'?

Question

I have tried many times, but seems the 'replace' can NOT work well after use 'loc'. For example I want to replace the 'conlumn_b' with an regex for the row that the 'conlumn_a' value is 'apple'.

Here is my sample code :

df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'].replace(r'^11*', 'XXX',inplace=True, regex=True)

Example:

conlumn_a       conlumn_b
apple           123
banana          11
apple           11
orange          33

The result that I expected for the 'df' is:

conlumn_a       conlumn_b
apple           123
banana          11
apple           XXX
orange          33

Anyone has meet this issue that needs 'replace' with regex after 'loc' ?

OR you guys has some other good solutions ?

Thank you so much for your help!

This question could be improved by providing code to produce the df. That code should make the numbers str rather than int types. — Josiah Yoder
– Josiah Yoder, Commented Jul 12, 2022 at 18:31

cs95 · Accepted Answer · 2018-01-18 07:15:03Z

14

inplace=True works on the object that it was applied on.

When you call .loc, you're slicing your dataframe object to return a new one.

>>> id(df)
4587248608

And,

>>> id(df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'])
4767716968

Now, calling an in-place replace on this new slice will apply the replace operation, updating the new slice itself, and not the original.

Now, note that you're calling replace on a column of int, and nothing is going to happen, because regular expressions work on strings.

Here's what I offer you as a workaround. Don't use regex at all.

m = df['conlumn_a'] == 'apple'
df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b'].replace(11, 'XXX')

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Or, if you need regex based substitution, then -

df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b']\
           .astype(str).replace('^11$', 'XXX', regex=True)

Although, this converts your column to an object column.

edited Jan 18, 2018 at 7:15

answered Jan 18, 2018 at 6:30

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jonathan Zhou Over a year ago

Thanks and I will take you suggestion. The situation is that I need regex in the 'replace', the '11 to XXX' is only a sample and the actually regex can be more complex.@cᴏʟᴅsᴘᴇᴇᴅ

JSB Over a year ago

Thank you for explaining the cause of the problem: I wasn't aware loc was creating a new object. Always good to understand the underlying cause.

Josiah Yoder Over a year ago

I'm still confused. Isn't loc supposed to provide an assignable view? Why does replace work after loc when doing ordinary replacement, but not when doing regex replacement?

Bharath M Shetty · Accepted Answer · 2018-01-18 07:18:38Z

6

I'm going to borrow from a recent answer of mine. This technique is a general purpose strategy for updating a dataframe in place:

df.update(
    df.loc[df['conlumn_a'] == 'apple', 'conlumn_b']
      .replace(r'^11$', 'XXX', regex=True)
)

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Note that all I did was remove the inplace=True and instead wrapped it in the pd.DataFrame.update method.

edited Jan 18, 2018 at 7:18

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

answered Jan 18, 2018 at 6:42

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

jezrael Over a year ago

@piRSquared - Why is update better? It is faster? Or nicer code only?

piRSquared Over a year ago

Nicer code. Using it for its intended purpose. I show how it should be used to accomplish what people meant to accomplish with inplace=True but couldn't because they were performing it on a subsequent object.

tricky Over a year ago

This is actually nicer code, but it's completely not performant vs the .loc alternative sadly. Just tried it on 2M rows, it doesn't run in less than one minute, and had to stop it. Meanwhile the "dirty" .loc alternative runs instantly

jezrael · Accepted Answer · 2018-01-18 06:29:17Z

3

I think you need filter in both sides:

m = df['conlumn_a'] == 'apple'
df.loc[m,'conlumn_b'] = df.loc[m,'conlumn_b'].astype(str).replace(r'^(11+)','XXX',regex=True)
print (df)
  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

answered Jan 18, 2018 at 6:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

4 Comments

Jonathan Zhou Over a year ago

But I got the same concern as you about the 'update' : it's faster or not. I think I am going to have a test about it.

jezrael Over a year ago

@JonathanZhou - The best test it, but in my opinion it is slowier.

mingchau Over a year ago

@jezrael do you know why the df[df[col1=='aa']][col2] = df[df[col1=='aa']][col2].replace(map_dict) won't work?

jezrael Over a year ago

@mingchau - problem is chained indexing, check this - it is repalce only filtered Series, not DataFrame slice

Collectives™ on Stack Overflow

Pandas how can 'replace' work after 'loc'?

3 Answers 3

3 Comments

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related