0

So I already took a look at this question.

I know you can conditionally replace a single column, but what about multiple columns? When I tried it, it doesn't seem to work.

the_data = np.array([[0, 1, 1, 1],
                     [0, 1, 3, 1],
                     [3, 4, 1, 3],
                     [0, 1, 2, 0],
                     [2, 1, 0, 0]])

the_data[:,0][the_data[:,0] == 0] = -1 # this works

columns_to_replace = [0, 1, 3]
the_data[:,columns_to_replace][the_data[:,columns_to_replace] == 0] = -1 # this does not work

I initially thought that the second case doesn't work because I thought the_data[:,columns_to_replace] creates a copy instead of directly referencing the elements. However, if that were the case, then the first case shouldn't work either, when you are only replacing the single column.

0

1 Answer 1

2

You're indeed getting a copy because you're using advanced indexing:

Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

(Taken from the docs)

The first part works because it uses basic slicing.


I think you can do this without copying, but still with some memory overhead:

columns_to_replace = [0, 1, 3]

mask = np.zeros(the_data.shape, bool) # don't use too much memory
mask[:, columns_to_replace] = 1

np.place(the_data, (the_data == 0) * mask, [-1]) # this doesn't copy anything
Sign up to request clarification or add additional context in comments.

8 Comments

Is there a workaround to this that doesn't invoke a copy? Or is looping the only option?
@user3426943, looks like there is a workaround. Please see my edit.
@user3483203, it still allocates memory, it just doesn't initialize it (and you'll have to fill it in anyway, with ones and zeros).
the_data[(the_data==0)*mask]=-1 should also work. The key is to find a mask that combines the columns criteria and the 0's test, as you do with the boolean *.
@hpaulj, and it does work, indeed, and it also looks simpler. I was searching for a solution that would very clearly say: "I'm not creating a copy instead of a view", so my solution's a bit too verbose about that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.