9

I have two 2D numpy arrays (simplified in this example with respect to size and content) with identical sizes.

An ID matrix:

1 1 1 2 2
1 1 2 2 5
1 1 2 5 5
1 2 2 5 5
2 2 5 5 5

and a value matrix:

14.8 17.0 74.3 40.3 90.2
25.2 75.9  5.6 40.0 33.7
78.9 39.3 11.3 63.6 56.7
11.4 75.7 78.4 88.7 58.6
79.6 32.3 35.3 52.5 13.3

My goal is to count and sum the values from the second matrix grouped by the IDs from the first matrix:

1: (8, 336.8)
2: (9, 453.4)
5: (8, 402.4)

I can do this in a for loop but when the matrices have sizes in thousands instead of just 5x5 and thousands of unique ID's, it takes a lot of time to process.

Does numpy have a clever method or a combination of methods for doing this?

3 Answers 3

6

Here's a vectorized approach to get the counts for ID and ID-based summed values for value with a combination of np.unique and np.bincount -

unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)

value_sums = np.bincount(idx,value.ravel())

To get the final output as a dictionary, you can use loop-comprehension to gather the summed values, like so -

{i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}

Sample run -

In [86]: ID
Out[86]: 
array([[1, 1, 1, 2, 2],
       [1, 1, 2, 2, 5],
       [1, 1, 2, 5, 5],
       [1, 2, 2, 5, 5],
       [2, 2, 5, 5, 5]])

In [87]: value
Out[87]: 
array([[ 14.8,  17. ,  74.3,  40.3,  90.2],
       [ 25.2,  75.9,   5.6,  40. ,  33.7],
       [ 78.9,  39.3,  11.3,  63.6,  56.7],
       [ 11.4,  75.7,  78.4,  88.7,  58.6],
       [ 79.6,  32.3,  35.3,  52.5,  13.3]])

In [88]: unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
    ...: value_sums = np.bincount(idx,value.ravel())
    ...: 

In [89]: {i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Out[89]: 
{1: (8, 336.80000000000001),
 2: (9, 453.40000000000003),
 5: (8, 402.40000000000003)}
Sign up to request clarification or add additional context in comments.

2 Comments

Nice one! I was not aware of the return_* arguments for np.unique.
@Divakar: Thank You! This was exactly the kind of solution I was looking for with a good performance due to the vectorisation.
1

This is possible with a combination of a few simple methods:

  1. use numpy.unique to find each ID
  2. create a boolean mask for each ID
  3. sum the 1s in the mask (count) and the values where the mask is 1

This can look like this:

import numpy as np

ids = np.array([[1, 1, 1, 2, 2],
                [1, 1, 2, 2, 5],
                [1, 1, 2, 5, 5],
                [1, 2, 2, 5, 5],
                [2, 2, 5, 5, 5]])

values = np.array([[14.8, 17.0, 74.3, 40.3, 90.2],
                   [25.2, 75.9,  5.6, 40.0, 33.7],
                   [78.9, 39.3, 11.3, 63.6, 56.7],
                   [11.4, 75.7, 78.4, 88.7, 58.6],
                   [79.6, 32.3, 35.3, 52.5, 13.3]])


for i in np.unique(ids):  # loop through all IDs
    mask = ids == i  # find entries that match current ID
    count = np.sum(mask)  # number of matches
    total = np.sum(values[mask])  # values of matches
    print('{}: ({}, {:.1f})'.format(i, count, total))  #print result

# Output:
# 1: (8, 336.8)
# 2: (9, 453.4)
# 5: (8, 402.4)

3 Comments

Its exactly that nasty for loop I'm referring to in my question, I should have been more clear on that though.
I think there is not really a succint way of doing that without the for loop. It may be possible, but would likely lead to very unreadable code. If you only have a few unique IDs there should not be a too big performace hit by the for loop. Anyway, I will think about it for a while...
Looks like I was just proven wrong in Divakar's answer.
0

The numpy_indexed package (disclaimer: I am its author) has functionality to solve these kind of problems in an elegant and vectorized manner:

import numpy_indexed as npi
group_by = npi.group_by(ID.flatten())
ID_unique, value_sums = group_by.sum(value.flatten())
ID_count = groupy_by.count    

Note: if you want to compute the sum and count in order to compute a mean, there is also group_by.mean; plus a lot of other useful functionality.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.