View of a view of a numpy array is a copy?

Question

If you change a view of a numpy array, the original array is also altered. This is intended behaviour.

arr = np.array([1,2,3])
mask = np.array([True, False, False])
arr[mask] = 0
arr
# Out: array([0, 2, 3])

However, if I take a view of such a view, and change that, then the original array is not altered:

arr = np.array([1,2,3])
mask_1 = np.array([True, False, False])
mask_1_arr = arr[mask_1]  # Becomes: array([1])
mask_2 = np.array([True])
mask_1_arr[mask_2] = 0
arr
# Out: array([1, 2, 3])

This implies to me that, when you take a view of a view, you actually get back a copy. Is this correct? Why is this?

The same behaviour occurs if I use numpy arrays of numerical indices instead of a numpy array of boolean values. (E.g. arr[np.array([0])][np.array([0])] = 0 doesn't change the first element of arr to 0.)

unutbu · Accepted Answer · 2016-08-04 16:26:51Z

15

Selection by basic slicing always returns a view. Selection by advanced indexing always returns a copy. Selection by boolean mask is a form of advanced indexing. (The other form of advanced indexing is selection by integer array.)

However, assignment by advanced indexing affects the original array.

So

mask = np.array([True, False, False])
arr[mask] = 0

affects arr because it is an assignment. In contrast,

mask_1_arr = arr[mask_1]

is selection by boolean mask, so mask_1_arr is a copy of part of arr. Once you have a copy, the jig is up. When Python executes

mask_2 = np.array([True])
mask_1_arr[mask_2] = 0

the assignment affects mask_1_arr, but since mask_1_arr is a copy, it has no effect on arr.

|            | basic slicing    | advanced indexing |
|------------+------------------+-------------------|
| selection  | view             | copy              |
| assignment | affects original | affects original  |

Under the hood, arr[mask] = something causes Python to call arr.__setitem__(mask, something). The ndarray.__setitem__ method is implemented to modify arr. After all, that is the natural thing one should expect __setitem__ to do.

In contrast, as an expression arr[indexer] causes Python to call arr.__getitem__(indexer). When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Hence a copy must be returned.

edited Aug 4, 2016 at 16:26

answered Aug 4, 2016 at 13:39

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

acdr Over a year ago

This all makes sense! I guess then that there's no easy way to do a one-liner like arr[x][y] = 1. Right now I'm doing this by assigning an intermediate value, e.g. int = arr[x]; int[y] = 1; arr[x] = int.

unutbu Over a year ago

If x is a boolean mask, a one-line equivalent would be np.put(arr, np.flatnonzero(x)[y], 1). np.flatnonzero(x) converts the boolean mask to a 1D integer array. You can then select some subset of those integers using np.flatnonzero(x)[y] where y could be a basic slice or advanced indexer. Then np.put(arr, np.flatnonzero(x)[y], 1) works since it is roughly equivalent to arr.flat[np.flatnonzero(x)[y]] = 1.

unutbu Over a year ago

If arr is 1-dimensional, arr[np.flatnonzero(x)[y]] = 1 also works. The purpose of np.put above is to provide an answer that works even if arr is n-dimensional.

Collectives™ on Stack Overflow

View of a view of a numpy array is a copy?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related