1

What I have:

import numpy as np
np.random.seed(42)
dlen = 250000
data = np.random.rand(dlen, 3, 3)
mask = np.random.choice([0, 1, 2], dlen)

What I want to get:

[[0.37454012 0.95071431 0.73199394], 
 [0.83244264 0.21233911 0.18182497], 
 [0.13949386 0.29214465 0.36636184], 
 [0.94888554 0.96563203 0.80839735], 
 [0.44015249 0.12203823 0.49517691],
 ....
(250000, 3)

What I try to use for this:

data[:,mask,:]

{MemoryError}Unable to allocate 1.36 TiB for an array with shape (250000, 250000, 3) and data type float64

What gives the correct result but looks strange:

data[np.arange(data.shape[0]), mask, :]

So what's the correct way to use this mask?

Upd.: The mask should select the column with the specified index. Example for an array with shape [2,3,3]:

array = [[[5 6 7], [7 8 9], [2 3 4]],
         [[2 1 0], [7 6 5], [7 6 5]]]
mask = [1 0]
result = [[7 8 9], 
          [2 1 0]]
6
  • Can you explain in words what your mask is supposed to accomplish? Commented Jul 10, 2021 at 13:22
  • Tried to explain in the question. Commented Jul 10, 2021 at 13:31
  • your given example array has only 2 axis not 3 Commented Jul 10, 2021 at 13:43
  • ..gives the correct result but looks strange - That works because your are using an Index array Commented Jul 10, 2021 at 13:45
  • These are the three axis in the example now. Commented Jul 10, 2021 at 13:55

1 Answer 1

2
data[np.arange(data.shape[0]), mask, :]

That works because it is a multi-dimensional index array

When I here the term mask I think of boolean indexing. Your integer mask can be converted to a boolean mask to use it the way you want.

>>> data.shape                 
(250000, 3, 3)
>>> mask.shape
(250000,)
>>> q = mask[:,None] == [0,1,2]
>>> q.shape
(250000, 3)
>>> q[:5]        
array([[ True, False, False],
       [False,  True, False],
       [False,  True, False],
       [False, False,  True],
       [False,  True, False]])
>>> r = data[q]
>>> r.shape
(250000, 3)
>>> r[:10]
array([[0.37454012, 0.95071431, 0.73199394],
       [0.83244264, 0.21233911, 0.18182497],
       [0.13949386, 0.29214465, 0.36636184],
       [0.94888554, 0.96563203, 0.80839735],
       [0.44015249, 0.12203823, 0.49517691],
       [0.66252228, 0.31171108, 0.52006802],
       [0.59789998, 0.92187424, 0.0884925 ],
       [0.14092422, 0.80219698, 0.07455064],
       [0.00552212, 0.81546143, 0.70685734],
       [0.31098232, 0.32518332, 0.72960618]])
>>>

You could use the second dimension length to make is a little more generic:

q = mask[:,None] == np.arange(data.shape[1])
>>> q[:5]                                        
array([[ True, False, False], 
       [False,  True, False], 
       [False,  True, False], 
       [False, False,  True], 
       [False,  True, False]])

If you control construction of the mask, you might want to construct it as a boolean array.


If this is new code, you might want to upgrade to a compatible version of Numpy and use the new random generator.

Sign up to request clarification or add additional context in comments.

1 Comment

Very helpful, thank you. For the construction of the mask I use mask = np.argmin(some_array, axis=1) Instead I can do something like this: boolean_mask = np.min(some_array, axis=1)[:, None] == some_array Just a note for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.