1

I have a large dataframe I took off an ODBC database. The Dataframe has multiple columns; I'm trying to change the values of one column by filtering two other. First, I filter my dataframe data_prem with both conditions which gives me the correct rows:

data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))]

Then I use the replace function on the selection to change 'M' value to 'H' value:

data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))]['Reinsurer'].replace(to_replace='M',value='H',inplace=True,regex=True)

Python warns me I'm trying to modify a copy of the dataframe, even though I'm clearly refering to the original dataframe (I'm posting image so you can see my results).

dataframe filtering

I also tried using .loc function in the following manner:

data_prem.loc[((data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))),'Reinsurer'] = 'H'

which changed all rows that fit the second condition (str.contains...), but it didn't apply the first condition. I got replacements in the 'Reinsurer' column for other 'PRODUCT_NAME' values as well.

I've been scouring the web for an answer to this for some time. I've seen some mentions of a bug in the pandas library, not sure if this is what they were talking about.

I would value any opinions you might have, would also be interesting in alternative ways to solving this problem. I filled the 'Reinsurer' column with the map function with 'PRODUCT_NAME' as the input (had a dictionary that connected all 'PRODUCT_NAME' values with 'Reinsurer' values).

5
  • No, I take the data off a server and create an Excel report. Commented Dec 10, 2018 at 9:53
  • it looks very strange to me, I don't find any logical mistake in your code. Commented Dec 10, 2018 at 10:02
  • Your first example.. data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))]['Reinsurer'].replace(to_replace='M',value='H',inplace=True,regex=True) is a clear example of chained indexing. The error is correct. Always use loc. Commented Dec 10, 2018 at 10:09
  • You should provide a minimal reproducible example with some data as text so we can reproduce your problem with loc. Commented Dec 10, 2018 at 10:10
  • OK, this is really strange. When I was trying to use the .loc function before, I wrote it differently than above: data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))].loc[data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))],'Reinsurer'] = 'H' Basically, I was using .loc on the filtered dataframe. I overcomplicated the code before (though I remember I tried simpler ways as well and had problems with them before I tried the overly complicated one). Commented Dec 10, 2018 at 11:56

1 Answer 1

1

Given your Boolean mask, you've demonstrated two ways of applying chained indexing. This is the cause of the warning and the reason why you aren't seeing your logic being applied as you anticipate.

mask = (data_prem['PRODUCT_NAME']=='ŽZ08') & df['BENEFIT'].str.contains('19.08.16')

Chained indexing: Example #1

df[mask]['Reinsurer'].replace(to_replace='M', value='H', inplace=True, regex=True)

Chained indexing: Example #2

df[mask].loc[mask, 'Reinsurer'] = 'H'

Avoid chained indexing

You can keep things simple by applying your mask once and using a single loc call:

df.loc[mask, 'Reinsurer'] = 'H'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.