10

Sorry, I do not know the protocol for re-asking a question if it doesn't get an answer. This question was asked a few months ago here: Numpy sum between pairs of indices in 2d array

I have a 2-d numpy array (MxN) and two more 1-d arrays (Mx1) that represent starting and ending indices for each row of the 2-d array that I'd like to sum over. I'm looking for the most efficient way to do this in a large array (preferably without having to use a loop, which is what I'm currently doing). An example of what i'd like to do is the following.

>>> random.seed(1234)
>>> a = random.rand(4,4)
>>> print a
[[ 0.19151945  0.62210877  0.43772774  0.78535858]
 [ 0.77997581  0.27259261  0.27646426  0.80187218]
 [ 0.95813935  0.87593263  0.35781727  0.50099513]
 [ 0.68346294  0.71270203  0.37025075  0.56119619]]
>>> b = array([1,0,2,1])
>>> c = array([3,2,4,4])
>>> d = empty(4)
>>> for i in xrange(4):
    d[i] = sum(a[i, b[i]:c[i]]) 

>>> print d
[ 1.05983651  1.05256841  0.8588124   1.64414897]

My problem is similar to the following question, however, I don't think the solution presented there would be very efficient. Numpy sum of values in subarrays between pairs of indices In that question, they want to find the sum of multiple subsets for the same row, so cumsum() can be used. However, I will only be finding one sum per row, so I don't think this would be the most efficient means of computing the sum.

1
  • 2
    The cumsum, or reduceat or similar tricks may be extremely competitive if you are summing small enough arrays, or the sum is not very small portion of the array. All these tricks will never get you within a factor of two for example... so if you add up twice as much as necessary you are still very fast. In other words... it depends on what you are doing, in your case your solution may very well be fastest you can easily do with numpy. Commented Jan 14, 2013 at 22:26

4 Answers 4

11

EDIT Added timing results for all answers so far, including the OP's code following @seberg's comment below, and the OP's method is the fastest:

def sliced_sum_op(a, b, c) :
    d = np.empty(a.shape[0])
    for i in xrange(a.shape[0]):
        d[i] = np.sum(a[i, b[i]:c[i]]) 
    return d

You can still get it done with np.cumsum with a big speed boost, although it will require storage equivalent to the size of your original array:

def sliced_sum(a, b, c) :
    cum = np.cumsum(a, axis=1)
    cum = np.hstack((np.zeros((a.shape[0], 1), dtype=a.dtype), cum))
    rows = np.arange(a.shape[0])
    return cum[rows, c] - cum[rows, b]

Timings are deceptive for your array, because your method is actually slightly faster than this one for small array sizes. But numpy soon wins it over, see the graph below for timings on random square arrays of size (n, n):

enter image description here

The above was generated with

import timeit
import matplotlib.pyplot as plt

n = np.arange(10, 1000, 10)
op = np.zeros(n.shape[0])
me = np.zeros(n.shape[0])
th = np.zeros(n.shape[0])
jp = np.zeros(n.shape[0])
for j, size in enumerate(n) :
    a = np.random.rand(size, size)
    b, c = indices = np.sort(np.random.randint(size + 1,
                                               size=(2, size)), axis=0)
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sliced_sum(a, b, c))
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sum_between2(a, b, c))
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sum_between_mmult(a, b, c))

    op[j] = timeit.timeit('sliced_sum_op(a, b, c)',
                          'from __main__ import sliced_sum_op, a, b, c',
                          number=10)
    me[j] = timeit.timeit('sliced_sum(a, b, c)',
                          'from __main__ import sliced_sum, a, b, c',
                          number=10)
    th[j] = timeit.timeit('sum_between2(a, b, c)',
                          'from __main__ import sum_between2, a, b, c',
                          number=10)
    jp[j] = timeit.timeit('sum_between_mmult(a, b, c)',
                          'from __main__ import sum_between_mmult, a, b, c',
                          number=10)
plt.subplot(211)
plt.plot(n, op, label='op')
plt.plot(n, me, label='jaime')
plt.plot(n, th, label='thorsten')
plt.plot(n, jp, label='japreiss')
plt.xlabel('n')
plt.legend(loc='best')
plt.show()
Sign up to request clarification or add additional context in comments.

3 Comments

Clever, +1... slight remark, though: I guess it should be rows = np.arange(a.shape[0]) ... it worked for you because a was square.
To be honest, I doubt the timing for the op solution is right... especially for the large arrays, it should be very competitive.... maybe you did use the python sum (you could also use .sum() to safe a lookup)?
@seberg You where absolutely right... In fact it's so competitive, that it is the fastest, once you replace the sum by np.sum. See the graph above. Do you want to add the correct answer, or should I edit mine to reflect that?
3

I like @Jaime's answer, but here is another approach. You can recast the problem a matrix multiplication.

If you multiply a by a vector of all ones, each element of the output vector will contain the sum of the corresponding row of a. To get the d you need, you can mask out the elements that are excluded from each row, and then multiply by a vector of all ones to get d.

def sum_between_mmult(ar, b, c):
    copy = np.copy(ar)
    nrows = ar.shape[0]
    ncols = ar.shape[1]
    for i in range(nrows):
        copy[i, :b[i]] = 0
        copy[i, c[i]:] = 0
    onevec = np.ones(ncols)
    return np.dot(copy, onevec)

Same as @Jaime, I only saw the speedup for larger matrix sizes. I have the feeling that some kind of fancy indexing trick could get rid of the for loop and give greater speedup. If you don't need the original array, you can overwrite it instead of making a copy, but this didn't yield much speedup in my tests.

Comments

2

I have another way to calculate your result, which works and does not use loops - but it is not really faster than the loop approach.

import time
import numpy as np

def sum_between1(ar, idc_l, idc_u):
    d = np.empty(ar.shape[0])
    for i in xrange(ar.shape[0]):
        d[i] = sum(ar[i, b[i]:c[i]]) 
    return d

def sum_between2(ar, idc_l, idc_u):
    indices = np.arange(ar.shape[1]).reshape(1,-1).repeat(ar.shape[0], axis=0)
    lower = idc_l.reshape(-1,1).repeat(ar.shape[1], axis=1)    
    upper = idc_u.reshape(-1,1).repeat(ar.shape[1], axis=1)
    mask = ~((indices>=lower) * (indices<upper))
    masked = np.ma.MaskedArray(ar, mask)
    return masked.sum(axis=1)

np.random.seed(1234)
a = np.random.rand(4,4)
print a
b = np.array([1,0,2,1])
c = np.array([3,2,4,4])

t0 = time.time()
for i in range(100000):
    d1 = sum_between1(a,b,c)
print "sum_between1: %.3f seconds" % (time.time()-t0)
print d1

t0 = time.time()
for i in range(100000):
    d2 = sum_between2(a,b,c)
print "sum_between2: %.3f seconds" % (time.time()-t0)
print d2

The output for me was

  [[ 0.19151945  0.62210877  0.43772774 ...,  0.92486763  0.44214076
   0.90931596]
 [ 0.05980922  0.18428708  0.04735528 ...,  0.53585166  0.00620852
   0.30064171]
 [ 0.43689317  0.612149    0.91819808 ...,  0.18258873  0.90179605
   0.70652816]
 ..., 
 [ 0.70568819  0.76402889  0.34460786 ...,  0.6933128   0.07778623
   0.4040815 ]
 [ 0.51348689  0.80706629  0.09896631 ...,  0.91118062  0.87656479
   0.96542923]
 [ 0.20231131  0.72637586  0.57131802 ...,  0.5661444   0.14668441
   0.09974442]]
sum_between1: 2.263 seconds
[ 1.05983651  0.24409631  1.54393475  2.27840642  1.65049179  1.86027107
  0.74002457  0.91248001  1.29180203  1.03592483  0.30448954  0.78028893
  1.15511632  1.74568981  1.0551406   1.73598504  1.32397106  0.22902658
  0.77533999  2.11800627  1.09181484  0.92074516  1.04588589  2.07584895
  1.13615918  1.33172081  1.41323751  2.01996291  1.69677797  0.57592999
  1.18049304  1.13052798  0.90715138  0.63876336  1.76712974  1.15138181
  0.29005541  1.46971707  0.57149804  1.8816212 ]
sum_between2: 1.817 seconds
[1.05983651005 0.244096306594 1.54393474534 2.27840641818 1.65049178537
 1.86027106627 0.740024568268 0.91248000774 1.29180203183 1.03592482812
 0.304489542783 0.78028892993 1.1551163203 1.74568980609 1.05514059758
 1.73598503833 1.32397105753 0.229026581839 0.77533999391 2.11800626878
 1.09181484127 0.92074516366 1.04588588779 2.07584895325 1.13615918351
 1.33172081033 1.41323750936 2.01996291037 1.69677797485 0.575929991717
 1.18049303662 1.13052797976 0.907151384823 0.638763358104 1.76712974497
 1.15138180543 0.290055405809 1.46971707447 0.571498038664 1.88162120474]

I post this answer because maybe someone else might have an idea how to improve my approach to make it faster.

2 Comments

There is a bug in your code: you are calling the OP's function both times. When you modify that, your code is much slower for small sizes, but much, much faster for large arrays. But the biggest problem is that right know it does not return the same as the OP's algorithm...
Thanks for your feedback! I really didn't see this typical c&p-bug. And therefore also didn't notice the bug in my function. Fixed both now, was only a 4-char fix ;-) Maybe you want to include this function in your graph now?
1

This is around 25% faster:

def zip_comp(a,b,c):
    return [np.sum(aa[bb:cc]) for aa, bb, cc in zip(a,b,c)]

If you are able to refactor the earlier code so instead of producing two lists for the slices, it produces a single binary 2D array, then you could make really significant gains using the later half of @japreiss method, or something similar. The slowdown for all these methods is the time spent messing around with the crazy indexing.

speed comparison, using Jaime's code:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.