Numpy sum of 2D array along axis=1, floating range

Question

I would like to perform a sum of a 2D array over the second axis, but on a range which is variable. Not vectorised it is:`

import numpy as np

nx = 3
ny = 5
a = np.ones((nx, ny))
left_bnd  = np.array([0, 1, 0])
right_bnd = np.array([2, 2, 4])

b = np.zeros(nx)
for jx in range(nx):
    b[jx] = np.sum(a[jx, left_bnd[jx]: right_bnd[jx]])

print(b)

The output, b, is [2. 1. 4.] I'd love to vectorise the loop, sort of

b = np.sum(a[:, left_bnd[:]: right_bnd[:], axis=1)

to speed up the calculation, because my "n" is typically a few 1e6. Unfortunately I cannot find a proper working syntax.

Slice bounds have to be scalars, so there's no way of left_bnd[:]: right_bnd[:] (why the use of [:] here?). Note also that the row slices will vary in length - another indication that compiled whole-array methods don't exist. — hpaulj
– hpaulj, Commented Mar 28, 2022 at 16:02

3 revs · Accepted Answer · 2022-03-28 16:04:03Z

A jitted numba implementation with manual summation in a for loop is around ~100x faster. Using np.sum with slicing inside the numba function was only half as fast. This solution assumes that all slices are within valid bounds.

Generation of sufficiently large sample data for benchmarking

import numpy as np
import numba as nb
np.random.seed(42)    # just for reproducibility

n, m = 5000, 100
a = np.random.rand(n,m)
bnd_l, bnd_r = np.sort(np.random.randint(m+1, size=(n,2))).T

Jitted with numba. Please make sure to benchmark compiled hot code by running the function at least twice.

@nb.njit
def slice_sum(a, bnd_l, bnd_r):
    b = np.zeros(a.shape[0])
    for j in range(a.shape[0]):
        for i in range(bnd_l[j], bnd_r[j]):
            b[j] += a[j,i]
    return b
slice_sum(a, bnd_l, bnd_r)

Output

# %timeit 1000 loops, best of 5: 297 µs per loop
array([ 4.31060848, 35.90684722, 38.03820523, ..., 37.9578962 ,
        3.61011028,  6.53631388])

With numpy inside a python loop (this is a nice, simple implementation)

b = np.zeros(n)
for j in range(n):
    b[j] = np.sum(a[ j, bnd_l[j] : bnd_r[j] ])
b

Output

# %timeit 10 loops, best of 5: 29.2 ms per loop
array([ 4.31060848, 35.90684722, 38.03820523, ..., 37.9578962 ,
        3.61011028,  6.53631388])

To verify the results are equal

np.testing.assert_allclose(slice_sum(a, bnd_l, bnd_r), b)

user7138814 · Accepted Answer · 2022-03-28 21:43:11Z

2

Here's a pure numpy solution that gets close to the speed of the posted numba solution. It leverages reduceat but the setup is quite convoluted.

def slice_sum_np(a, left_bnd, right_bnd):
    nx, ny = a.shape
    linear_indices = np.c_[left_bnd, right_bnd] + ny * np.arange(nx)[:,None]
    sums = np.add.reduceat(a.ravel(), linear_indices.ravel())[::2]
    # account for reduceat special case
    sums[left_bnd >= right_bnd] = 0
    return sums

answered Mar 28, 2022 at 21:43

user7138814

2,05112 silver badges12 bronze badges

3 Comments

Michael Szczesny Over a year ago

This is a great solution and surprisingly fast considering that you have to discard half of the computed sums. I'm wondering how numpy speeds up the summation. I would favor this being the accepted answer as numba is a heavy (and sometimes stubborn) dependency.

Giovanni Tardini Over a year ago

I also love solutions involving only conventional packages. I still slightly favour the numba solution though, because it is applicable to a bunch of similar problems, say when having np.average or np.trapz instead of np.sum. (Of course one can extend also this numpy solution to those cases, with some effort). It is also a bit more intuitive to read/maintain.

user7138814 Over a year ago

I agree with both sentiments. If numba becomes a standard/conventional dependency then one can get the best of both :) I have a feeling it's not as difficult to install as it used to be, especialy with conda of course.

Collectives™ on Stack Overflow

Numpy sum of 2D array along axis=1, floating range

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related