1

I have a [n x n] matrix containing values belonging to different groups, and a [1 x n] vector defining to which group each element belongs. (n usually ~1E4, in this example n=4)

I want to calculate a matrix obtained by summing together all the elements belonging to the same group.

I use np.where() to calculate the indices where the elements of each group are located. When I use the calculated indices, I do not obtain the expected elements, because I select pairs of positions instead of ranges (I'm used to Matlab, where I can simply select M(idx1,idx2) ).

import numpy as np

n=4
M = np.random.rand(n,n)
print(M)

# This vector defines to which group each element belong
belongToGroup = np.array([0, 1, 0, 2])

nGroups=np.max(belongToGroup);

# Calculate a matrix obtained by summing elements belonging to the same group
M_sum = np.zeros((nGroups+1,nGroups+1))
for g1 in range(nGroups+1):
    idxG1 = np.where(belongToGroup==g1)
    for g2 in range(nGroups+1):
        idxG2 = np.where(belongToGroup==g2)
        print('g1 = ' + str(g1))
        print('g2 = ' + str(g2))
        print(idxG1[0])
        print(idxG2[0])
        print(M[idxG1[0],idxG2[0]])
        print(np.sum(M[idxG1[0],idxG2[0]]))
        M_sum[g1,g2]=np.sum(M[idxG1[0],idxG2[0]])

print('')
print('Example of the problem:')
print('Elements I would like to sum to obtain M_sum[0,0]')
print(M[0:2,0:2])
print('Elements that are summed instead')
print(M[[0,1],[0,1]])

Example of the problem: In the example above, the element M_sum[0,0] should be the sum of M[0,0], M[0,1], M[1,0], and M[1,1] Instead, it is calculated as the sum of M[0,0] and M[1,1]

2 Answers 2

2

In MATLAB, indexing with 2 lists (actually matrices) picks a block. numpy on the other hand tries to broadcast the indexing arrays against each other, and returns selected points. Its behavior is close to what sub2ind does in MATLAB.

In [971]: arr = np.arange(16).reshape(4,4)                                      
In [972]: arr                                                                   
Out[972]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
In [973]: i1, i2 = np.array([0,2,3]), np.array([1,2,0])                         

Indexing with 2 1d arrays of the same size:

In [974]: arr[i1,i2]
Out[974]: array([ 1, 10, 12])

This, in effect returns [arr[0,1], arr[2,2], arr[3,0]], one element for each point of matching indices.

But if I turn one index into a 'column vector', it selects from rows, while i2 selects from columns.

In [975]: arr[i1[:,None], i2]                                                   
Out[975]: 
array([[ 1,  2,  0],
       [ 9, 10,  8],
       [13, 14, 12]])

MATLAB makes the block indexing easy, while individual access is harder. In numpy the block access is a bit harder, though the underlying mechanics is the same.

With your example i1[0] and i2[0] can be arrays like:

array([0, 2]), array([3])
(2,) (1,)

The shape (1,) array can broadcast with the (2,) or with a (2,1) array just as well. Your code would fail if instead is[0] were np.array([0,1,2]), a (3,) array which can't pair with the (2,) array. But with a (2,1) it produces a (2,3) block.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the reply. I tried changing the following line: M_sum[g1,g2]=np.sum(M[idxG1[0],idxG2[0]]) Into the following: M_sum[g1,g2]=np.sum(M[idxG1[:,None],idxG2[0]]) Maybe I misunderstood. I get an error "tuple indices must be integers, not tuple"
You still need to use the [0]. idxG1 produced by where is a tuple, one element per dimension of the source. idxG1[0] gives you the indexing array for the 1st dimension. idxG1[0][:,None] adds a dimension to that. That ":,None" expression is actually a tuple. In Python, almost every expression containing a comma is a tuple :)
1

You can use np.ix_ to obtain matlab-ish behavior:

A = np.arange(9).reshape(3, 3)
A[[1,2],[0,2]]
# array([3, 8])
A[np.ix_([1,2],[0,2])]
# array([[3, 5],
#        [6, 8]])

Under the hood, np.ix_ does what @hpaulj describes in detail:

np.ix_([1,2],[0,2])
# (array([[1],
#        [2]]), array([[0, 2]]))

You can apply this to your specific problem as follows:

M = np.random.randint(0, 10, (n, n))
M
# array([[6, 2, 7, 1],
#        [6, 7, 9, 5],
#        [9, 4, 3, 2],
#        [3, 1, 7, 9]])
idx = np.array([0, 1, 0, 2])

ng = idx.max() + 1
out = np.zeros((ng, ng), M.dtype)
np.add.at(out, np.ix_(idx, idx), M)
out
# array([[25,  6,  3],
#        [15,  7,  5],
#        [10,  1,  9]])

As an aside: There is a faster but less obvious solution that relies on flat indexing:

np.bincount(np.ravel_multi_index(np.ix_(idx, idx), (ng, ng)).ravel(), M.ravel(), ng*ng).reshape(ng, ng)
# array([[25.,  6.,  3.],
#        [15.,  7.,  5.],
#        [10.,  1.,  9.]])

1 Comment

Thank you, this worked too. To summarize: WRONG: M_sum[g1,g2]=np.sum(M[idxG1[0],idxG2[0]]) CORRECT 1 (thank you @hpaulj ) M_sum[g1,g2]=np.sum(M[idxG1[0][:,None],idxG2[0]]) CORRECT 2 (thank you Paul Panzer ) M_sum[g1,g2]=np.sum(M[np.ix_(idxG1[0],idxG2[0])])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.