Indexing different sized ranges in a 2D numpy array using a Pythonic vectorized code

Question

I have a numpy 2D array, and I would like to select different sized ranges of this array, depending on the column index. Here is the input array a = np.reshape(np.array(range(15)), (5, 3)) example

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]

Then, list b = [4,3,1] determines the different range sizes for each column slice, so that we would get the arrays

[0 3 6 9]
[1 4 7]
[2]

which we can concatenate and flatten to get the final desired output

[0 3 6 9 1 4 7 2]

Currently, to perform this task, I am using the following code

slices = []
for i in range(a.shape[1]):
    slices.append(a[:b[i],i])

c = np.concatenate(slices)

and, if possible, I want to convert it to a pythonic format.

Bonus: The same question but now considering that b determines row slices instead of columns.

Divakar · Accepted Answer · 2020-08-20 21:50:52Z

6

We can use broadcasting to generate an appropriate mask and then masking does the job -

In [150]: a
Out[150]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [151]: b
Out[151]: [4, 3, 1]

In [152]: mask = np.arange(len(a))[:,None] < b

In [153]: a.T[mask.T]
Out[153]: array([0, 3, 6, 9, 1, 4, 7, 2])

Another way to mask would be -

In [156]: a.T[np.greater.outer(b, np.arange(len(a)))]
Out[156]: array([0, 3, 6, 9, 1, 4, 7, 2])

Bonus : Slice per row

If we are required to slice per row based on chunk sizes, we would need to modify few things -

In [51]: a
Out[51]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# slice lengths per row
In [52]: b
Out[52]: [4, 3, 1]

# Usual loop based solution :
In [53]: np.concatenate([a[i,:b_i] for i,b_i in enumerate(b)])
Out[53]: array([ 0,  1,  2,  3,  5,  6,  7, 10])

# Vectorized mask based solution :
In [54]: a[np.greater.outer(b, np.arange(a.shape[1]))]
Out[54]: array([ 0,  1,  2,  3,  5,  6,  7, 10])

edited Aug 20, 2020 at 21:50

answered Aug 12, 2020 at 17:01

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

xicocaio Over a year ago

This solution was very clever, much obliged! Also, this was very enlightening, as I did not know this broadcasting concept, and it seems very useful. As a side note, I just verified that for the second approach, the outer method uses an internal product, so it seems it would be a little slower, is that right? For massive datasets, do you think the difference in speed would be significant?

Divakar Over a year ago

@xicocaio Second one avoids the transpose, but transpose won't copy. So, in the end, I would think that these two should be comparable.

xicocaio Over a year ago

Ok, but isn't the outer product of the two vectors expensive for very big arrays?

hpaulj Over a year ago

[152] is an 'outer' comparison too. The total number of comparisons is the same. The difference is a more a matter of syntax than actual 'work'.

Prox Over a year ago

first approach: 3.13 µs ± 20.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) second approach: 3.72 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) Not the greatest test cause I used the 5x3 array. But the std. dev. suggests the first approach would scale better.

|

Collectives™ on Stack Overflow

Indexing different sized ranges in a 2D numpy array using a Pythonic vectorized code

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related