9

I have a numpy 2D array, and I would like to select different sized ranges of this array, depending on the column index. Here is the input array a = np.reshape(np.array(range(15)), (5, 3)) example

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]

Then, list b = [4,3,1] determines the different range sizes for each column slice, so that we would get the arrays

[0 3 6 9]
[1 4 7]
[2]

which we can concatenate and flatten to get the final desired output

[0 3 6 9 1 4 7 2]

Currently, to perform this task, I am using the following code

slices = []
for i in range(a.shape[1]):
    slices.append(a[:b[i],i])

c = np.concatenate(slices)

and, if possible, I want to convert it to a pythonic format.

Bonus: The same question but now considering that b determines row slices instead of columns.

1 Answer 1

6

We can use broadcasting to generate an appropriate mask and then masking does the job -

In [150]: a
Out[150]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [151]: b
Out[151]: [4, 3, 1]

In [152]: mask = np.arange(len(a))[:,None] < b

In [153]: a.T[mask.T]
Out[153]: array([0, 3, 6, 9, 1, 4, 7, 2])

Another way to mask would be -

In [156]: a.T[np.greater.outer(b, np.arange(len(a)))]
Out[156]: array([0, 3, 6, 9, 1, 4, 7, 2])

Bonus : Slice per row

If we are required to slice per row based on chunk sizes, we would need to modify few things -

In [51]: a
Out[51]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# slice lengths per row
In [52]: b
Out[52]: [4, 3, 1]

# Usual loop based solution :
In [53]: np.concatenate([a[i,:b_i] for i,b_i in enumerate(b)])
Out[53]: array([ 0,  1,  2,  3,  5,  6,  7, 10])

# Vectorized mask based solution :
In [54]: a[np.greater.outer(b, np.arange(a.shape[1]))]
Out[54]: array([ 0,  1,  2,  3,  5,  6,  7, 10])
Sign up to request clarification or add additional context in comments.

6 Comments

This solution was very clever, much obliged! Also, this was very enlightening, as I did not know this broadcasting concept, and it seems very useful. As a side note, I just verified that for the second approach, the outer method uses an internal product, so it seems it would be a little slower, is that right? For massive datasets, do you think the difference in speed would be significant?
@xicocaio Second one avoids the transpose, but transpose won't copy. So, in the end, I would think that these two should be comparable.
Ok, but isn't the outer product of the two vectors expensive for very big arrays?
[152] is an 'outer' comparison too. The total number of comparisons is the same. The difference is a more a matter of syntax than actual 'work'.
first approach: 3.13 µs ± 20.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) second approach: 3.72 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) Not the greatest test cause I used the 5x3 array. But the std. dev. suggests the first approach would scale better.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.