32

Say I have a dataframe

import pandas as pd
import numpy as np
foo = pd.DataFrame(np.random.random((10,5)))

and I create another dataframe from a subset of my data:

bar = foo.iloc[3:5,1:4]

does bar hold a copy of those elements from foo? Is there any way to create a view of that data instead? If so, what would happen if I try to modify data in this view? Does Pandas provide any sort of copy-on-write mechanism?

1
  • so when I do bar.loc[:, ['a', 'b']] it returns a copy, but when I do bar.loc[:, 'a'] it returns a view? Commented Jul 11, 2017 at 23:27

1 Answer 1

42

Your answer lies in the pandas docs: returning-a-view-versus-a-copy.

Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

In your example, bar is a view of slices of foo. If you wanted a copy, you could have used the copy method. Modifying bar also modifies foo. pandas does not appear to have a copy-on-write mechanism.

See my code example below to illustrate:

In [1]: import pandas as pd
   ...: import numpy as np
   ...: foo = pd.DataFrame(np.random.random((10,5)))
   ...: 

In [2]: pd.__version__
Out[2]: '0.12.0.dev-35312e4'

In [3]: np.__version__
Out[3]: '1.7.1'

In [4]: # DataFrame has copy method
   ...: foo_copy = foo.copy()

In [5]: bar = foo.iloc[3:5,1:4]

In [6]: bar == foo.iloc[3:5,1:4] == foo_copy.iloc[3:5,1:4]
Out[6]: 
      1     2     3
3  True  True  True
4  True  True  True

In [7]: # Changing the view
   ...: bar.ix[3,1] = 5

In [8]: # View and DataFrame still equal
   ...: bar == foo.iloc[3:5,1:4]
Out[8]: 
      1     2     3
3  True  True  True
4  True  True  True

In [9]: # It is now different from a copy of original
   ...: bar == foo_copy.iloc[3:5,1:4]
Out[9]: 
       1     2     3
3  False  True  True
4   True  True  True
Sign up to request clarification or add additional context in comments.

6 Comments

so when I do bar.loc[:, ['a', 'b']] it returns a copy, but when I do bar.loc[:, 'a'] it returns a view?
The bar.loc[:, 'a'] acts like a slice, which returns a view, vs bar.loc[:, ['a', 'b']], which uses list indexing which returns a copy. Note that bar.loc[:, ['a']] would also return a copy.
how about bar['a']? is it a view or a copy?
@davidshinn Is the highlighted quote still in the docs you linked? I can't find it!
It has been revised since the original response (the quote is in version 0.13): pandas.pydata.org/pandas-docs/version/0.13/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.