19

I would like to slice a DataFrame with a Boolean index obtaining a copy, and then do stuff on that copy independently of the original DataFrame.

Judging from this answer, selecting with .loc using a Boolean array will hand me back a copy, but then, if I try to change the copy, SettingWithCopyWarning gets in the way. Would this then be the correct way:

import numpy as np
import pandas as pd
d1 = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
# create a new dataframe from the sliced copy
d2 = pd.DataFrame(d1.loc[d1.a > 1, :])
# do stuff with d2, keep d1 unchanged
2
  • SettingWithCopyWarning is just a warning. It tells you that modifications you do on that DataFrame will not change the original DataFrame. You can disable them altogether or just use d2.is_copy = None after the assignment. Commented Jul 7, 2017 at 11:10
  • DataFrame.is_copy is no longer in the API. Commented Feb 23, 2021 at 20:39

1 Answer 1

27

You need copy with boolean indexing, new DataFrame constructor is not necessary:

d2 = d1[d1.a > 1].copy()

Explanation of warning:

If you modify values in d2 later you will find that the modifications do not propagate back to the original data (d1), and that Pandas does warning.

Sign up to request clarification or add additional context in comments.

2 Comments

That's what I was using, I changed it because I seem to have read in the docs somewhere that .copy() is not the recommended way, but I may have been mistaken.
Yes, if need new object need copy. if not need original d1 = d1[d1.a > 1] should work also.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.