Grid distribution with numpy

Question

I have a huge matrix of values and i want to distribute them on a grid and compute the mean of each box of the grid. For the moment I use a loop on all values but I am looking for a vectorized way to treat it to reduce execution time.

import numpy as np    

values = np.arange(0,1000)

ind_x = (values/10)%3
ind_y = values%3

box_sum = np.zeros((3,3))
box_nb = np.zeros((3,3))

for v in range(0,len(values)):
    box_sum[ind_x[v],ind_y[v]] += values[v] 
    box_nb[ind_x[v],ind_y[v]] += 1

box_mean = np.divide(box_sum,box_nb)

In this exemple ind_x and ind_y are built arithmetically, but in the application it may be random values. Any idea ?

Divakar · Accepted Answer · 2016-04-22 08:07:32Z

2

You can use np.bincount, like so -

id = ind_x*3 + ind_y # Generate 1D linear index IDs for use with bincount

box_sum = np.bincount(id,values,minlength=9).reshape(3,3)
box_nb = np.bincount(id,minlength=9).reshape(3,3)

Sample run -

1) Setup inputs and run original code :

In [59]: # Let's use random numbers to test out variety as also OP states : 
         # "..  in the application it may be random values"
    ...: values = np.random.randint(0,1000,(1000))
    ...: 
    ...: # Rest of the code same as the one posted within the question
    ...: ind_x = (values/10)%3
    ...: ind_y = values%3
    ...: 
    ...: box_sum = np.zeros((3,3))
    ...: box_nb = np.zeros((3,3))
    ...: 
    ...: for v in range(0,len(values)):
    ...:     box_sum[ind_x[v],ind_y[v]] += values[v] 
    ...:     box_nb[ind_x[v],ind_y[v]] += 1
    ...:     

In [60]: box_sum
Out[60]: 
array([[ 64875.,  50268.,  50496.],
       [ 48759.,  61661.,  53575.],
       [ 53076.,  48529.,  76576.]])

In [61]: box_nb
Out[61]: 
array([[ 125.,  105.,   96.],
       [  97.,  116.,  116.],
       [  96.,  100.,  149.]])

2) Use proposed approach and thus verify results :

In [62]: id = ind_x*3 + ind_y

In [63]: np.bincount(id,values,minlength=9).reshape(3,3)
Out[63]: 
array([[ 64875.,  50268.,  50496.],
       [ 48759.,  61661.,  53575.],
       [ 53076.,  48529.,  76576.]])

In [64]: np.bincount(id,minlength=9).reshape(3,3)
Out[64]: 
array([[125, 105,  96],
       [ 97, 116, 116],
       [ 96, 100, 149]])

edited Apr 22, 2016 at 8:07

answered Apr 22, 2016 at 8:01

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vince Over a year ago

It's perfect ! The save in time is real. Thanks a lot Divakar.

Divakar Over a year ago

@Vince np.bincount is one of the fastest tools in NumPy! So, not surprised at all :)

Eelco Hoogendoorn · Accepted Answer · 2016-04-22 10:29:55Z

1

The numpy_indexed package (disclaimer: I am its author) can be used to solve such problems in an efficient manner:

import numpy_indexed as npi
(unique_x, unique_y), mean = npi.group_by((idx_x, idx_y)).mean(values)

I suspect the bincount solution is faster for a use case of a relatively dense grid; because this operates on a sparse grid (what you get back is a tuple of arrays of indices where the mean is computed, and a matching list of means); but that may be a huge advantage if your grid is infact quite sparse(as you say, the idx are 'random', or at least not as structured in practice).

Also, this is more flexible; group_by allows you to compute a variety of statistics, for keys of various dtypes and value arrays of higher dimensions.

answered Apr 22, 2016 at 10:29

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

2 Comments

Vince Over a year ago

Thanks for the answer but I need a robust program that I can give to my colleagues easily.

Eelco Hoogendoorn Over a year ago

pip install numpy-indexed not easy enough? :). I would say that the test suite could still use some work, but I'm using numpy-indexed in various production cases myself, so I'm pretty confident about its robustness.

Collectives™ on Stack Overflow

Grid distribution with numpy

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related