2

In Python 3, using numpy, I have an array of values and an array of indexes. I need to find the sum (or mean) of the value array at each index in the index array, for each value in the index array. For speed reasons I am looking to do this using numpy slicing instead of for loops.

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],)
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])
# Something like desired_sum = numpy.sum(value_array[index_array])
# Where the output will be
desired_sum  = numpy.array([5.1, 10.7, 3, 0, 20])

This will be running on larger arrays {~shape = (2000, 2000)} with a few hundred indexes.

2 Answers 2

0

This is sort of a duplicate of this question. The code for Python 3 is:

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
# index_array must be integers an have no negative values
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]]).astype(int)
sum_array     = numpy.bincount(index_array.flatten(), weights = value_array.flatten())
average_array = sum_array / numpy.bincount(index_array.flatten())

print('sum_array = ' + repr(sum_array))
print('average_array = ' + repr(average_array))

The output is

sum_array = array([ 5.1, 10.7, 3. , 0. , 20. ])
average_array = array([ 2.55, 3.56666667, 1.5, nan, 20. ])
Sign up to request clarification or add additional context in comments.

Comments

0

You could broadcast the indexes and value matrix over the range of resulting positions, adding one more dimension to form multiple masks of the value matrix.

For example, values would give 5 copies of the 2d matrix:

array([[[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]]])

For each of these copies, keep only the values that correspond to the index in the corresponding index_array (using broadcasting over the 0..4 indexes of the output). Given that numpy only processes fixes sized dimensions, selecting/excluding values will be done by multiplying by 1 or 0 (boolean of index comparisons):

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])

result_index = numpy.arange(5)[:,None,None]

eligible_values = numpy.tile(value_array,(5,1)).reshape(-1,*value_array.shape)
eligible_values *= result_index==index_array
result = numpy.sum(eligible_values,axis=(1,2))

print(result)
[ 5.1 10.7  3.   0.  20. ]

To get averages, you can compute the counts in a similar fashion (you only need to use the indexes for those):

counts = numpy.sum(result_index==index_array,axis=(1,2))

print(counts)
[2 3 2 0 1]

print(result/counts)  # averages
[ 2.55        3.56666667  1.5                nan 20.        ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.