NumPy: Select and sum data into array

Question

I have a (large) data array and a (large) list of lists of (a few) indices, e.g.,

data = [1.0, 10.0, 100.0]
contribs = [[1, 2], [0], [0, 1]]

For each entry in contribs, I'd like to sum up the corresponding values of data and put those into an array. For the above example, the expected result would be

out = [110.0, 1.0, 11.0]

Doing this in a loop works,

c = numpy.zeros(len(contribs))
for k, indices in enumerate(contribs):
    for idx in indices:
        c[k] += data[idx]

but since data and contribs are large, it's taking too long.

I have the feeling this can be improved using numpy's fancy indexing.

Any hints?

Sure, but the fact that the entries contribs have varying length makes it difficult. I didn't get anywhere. — Nico Schlömer
– Nico Schlömer, Commented Jun 29, 2016 at 18:39

lballes · Accepted Answer · 2016-06-29 18:39:14Z

5

One possibility would be

data = np.array(data)
out = [np.sum(data[c]) for c in contribs]

Should be faster than the double loop, at least.

answered Jun 29, 2016 at 18:39

lballes

1,52213 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Divakar · Accepted Answer · 2016-06-29 19:18:31Z

Here's an almost vectorized * approach -

# Get lengths of list element in contribs and the cumulative lengths
# to be used for creating an ID array later on.
clens = np.cumsum([len(item) for item in contribs])

# Setup ID array that corresponds to same ID for same list element in contribs.
# These IDs would be used to accumulate values from a corresponnding array
#  that is created by indexing into data array with a flattened contribs
id_arr = np.zeros(clens[-1],dtype=int)
id_arr[clens[:-1]] = 1
out = np.bincount(id_arr.cumsum(),np.take(data,np.concatenate(contribs)))

This approach involves some setting up work. So, the benefits would be hopefully seen when fed with decent sized input arrays and a decent number of list elements in contribs, which would correspond to the looping in an otherwise loopy solution.

*Please note that this coined as almost vectorized because the only looping performed here is at the start, where we are getting the lengths of the list elements. But that part not being so computationally demanding should have minimal effect on the total runtime.

Benjamin · Accepted Answer · 2016-06-29 20:05:28Z

0

I'm not certain that all cases work, but for your example, with data as a numpy.array:

# Flatten "contribs"
f = [j for i in contribs for j in i]

# Get the "ranges" of data[f] that will be summed in the next step
i = [0] + numpy.cumsum([len(i) for i in contribs]).tolist()[:-1]

# Take the required sums
numpy.add.reduceat(data[f], i)

answered Jun 29, 2016 at 20:05

Benjamin

12k13 gold badges75 silver badges121 bronze badges

Collectives™ on Stack Overflow

NumPy: Select and sum data into array

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related