4

I have a dataframe with multiple columns and I simply want to update a column with new values df['Z'] = df['A'] % df['C']/2. However, I keep getting SettingWithCopyWarning message even when I use the .loc[] method or when I drop() the column and add it again.

:75: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Although the warning disappears with .assign() method, but it is painstakingly slower. Here is a comparison

df = pd.DataFrame(data=np.random.randn(2000000, 26), 
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

%timeit df['Z'] = df['A'] % df['C']/2
119 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.loc[:, 'Z'] = df['A'] % df['C']/2
118 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.assign(Z=df['A'] % df['C']/2)
857 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So what's the optimal way to update a column in the dataframe. Note that I don't have the option to create multiple copies of the same dataframe because of its huge size.

6
  • Did you try your sample data, I have not received the SettingWithCopyWarning Commented Aug 14, 2020 at 1:01
  • 1
    stackoverflow.com/questions/20625582/… Commented Aug 14, 2020 at 1:04
  • this is just a warning - it won't affect anything. FWIW, this changes across versions and I don't see in version 1.01 Commented Aug 14, 2020 at 1:11
  • 1
    I do get these warnings even when using .loc, you can stop them by calling pd.set_option('mode.chained_assignment', None) in main... pandas explanation does not convince me, it might be a bug Commented Aug 14, 2020 at 1:11
  • @BEN_YO Actually this sample code is meant for comparing the three assignment operations only. Commented Aug 14, 2020 at 1:25

1 Answer 1

3

tl;dr - make a copy of the slice using copy or suppress the warning with pd.set_option('mode.chained_assignment', None)

There are some great posts about SettingWithCopy Warnings. First off, I say, this is just a warning and not an error. Most of the time this is warning you of behavior you didn't really intend to happen anyway or you really don't care.

Now, let's avoid this warning. Giving your data I am going to duplicate the warning first on purpose.

df = pd.DataFrame(data=np.random.randn(2000000, 26), 
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

# if we use execute df['Z'] = df['A'] % df['C']/2 no warning here.
df['Z'] = df['A'] % df['C']/2

# However, let's slice this dataframe just removing the last row using this syntax
df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.

In this case, this warning is letting you know you are changing the original df object.

df = pd.DataFrame(data=np.random.randn(2000000, 26), 
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])

Returns the above warning and True, modifying the slice did change the original df object.

Now, to avoid the warning and not changing the original object use copy

df = pd.DataFrame(data=np.random.randn(2000000, 26), 
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

df_slice = df.loc[:1999998].copy()
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])

Returns no warning and False.

So, this is one way to use retaining your performance with first and second methods by using .copy() when creating your slice/view of a dataframe. However, you are correct this does take extra memory. Overwrite your dataframe with .copy()

Or you can turn this warning off using:

pd.set_option('mode.chained_assignment', None)
df = pd.DataFrame(data=np.random.randn(2000000, 26), 
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])

Returns No warning and True.

In short, pandas sometimes creates a new object for slices of a dataframe, and sometimes it doesn't, where this new slice is a view of the original dataframe. When pandas does this is understood by few and not very well documented I where I could find it.

There is a strong hint to when this warning will appear and that is to use the _is_view attribute.

df_slice = df.loc[:1999998]
df_slice._is_view

Returns True, hence the SettingWithCopyError might happen.

df_slice = df.loc[:1999998].copy()
df_slice._is_view

Returns False.

Sign up to request clarification or add additional context in comments.

3 Comments

_is_view is a good trick and it does seem to work on my example dataframe. However, it doesn't seem to work everytime. In my original dataframe (which I can't share) even though is_view returns False but I still get the warning.
in your example df_slice = df.loc[:1999998], I can see different data in df_slice and in df.loc[:1999998]. Then why does the all(df.loc[:1999998, 'Z'] == df_slice['Z']) comparison is returning all True?
_is_view is only a hint. I could not find documentation on when pandas creates a veiw or a new object. If you don't use .copy() then when you change df_slice, you also change df, therefore all(df.loc...) will yield True. If you did use .copy() then you have a separate object and all(df.loc..) will yield False. Import that you recreate df inbetween each of these test.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.