Sum of numpy array based on array of indexes

Question

In Python 3, using numpy, I have an array of values and an array of indexes. I need to find the sum (or mean) of the value array at each index in the index array, for each value in the index array. For speed reasons I am looking to do this using numpy slicing instead of for loops.

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],)
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])
# Something like desired_sum = numpy.sum(value_array[index_array])
# Where the output will be
desired_sum  = numpy.array([5.1, 10.7, 3, 0, 20])

This will be running on larger arrays {~shape = (2000, 2000)} with a few hundred indexes.

Christopher Pratt · Accepted Answer · 2023-04-21 00:22:14Z

0

This is sort of a duplicate of this question. The code for Python 3 is:

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
# index_array must be integers an have no negative values
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]]).astype(int)
sum_array     = numpy.bincount(index_array.flatten(), weights = value_array.flatten())
average_array = sum_array / numpy.bincount(index_array.flatten())

print('sum_array = ' + repr(sum_array))
print('average_array = ' + repr(average_array))

The output is

sum_array = array([ 5.1, 10.7, 3. , 0. , 20. ])
average_array = array([ 2.55, 3.56666667, 1.5, nan, 20. ])

answered Apr 21, 2023 at 0:22

Christopher Pratt

3672 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alain T. · Accepted Answer · 2023-04-23 07:50:22Z

You could broadcast the indexes and value matrix over the range of resulting positions, adding one more dimension to form multiple masks of the value matrix.

For example, values would give 5 copies of the 2d matrix:

array([[[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]]])

For each of these copies, keep only the values that correspond to the index in the corresponding index_array (using broadcasting over the 0..4 indexes of the output). Given that numpy only processes fixes sized dimensions, selecting/excluding values will be done by multiplying by 1 or 0 (boolean of index comparisons):

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])

result_index = numpy.arange(5)[:,None,None]

eligible_values = numpy.tile(value_array,(5,1)).reshape(-1,*value_array.shape)
eligible_values *= result_index==index_array
result = numpy.sum(eligible_values,axis=(1,2))

print(result)
[ 5.1 10.7  3.   0.  20. ]

To get averages, you can compute the counts in a similar fashion (you only need to use the indexes for those):

counts = numpy.sum(result_index==index_array,axis=(1,2))

print(counts)
[2 3 2 0 1]

print(result/counts)  # averages
[ 2.55        3.56666667  1.5                nan 20.        ]

Collectives™ on Stack Overflow

Sum of numpy array based on array of indexes

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related