0

I am in the following situation - I have the following:

  • Multidimensional numpy array a of n dimensions
  • t, an array of k rows (tuples), each with n elements. In other words, each row in this array is an index in a

What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.

Seems trivial enough. The following approach, however, does not work

def get(a, t):
    # wrong result + takes way too long
    return a[t]

I have to resort to doing this iteratively i.e. the following works correctly:

def get(a, t):
    res = []
    for ind in t:
        a_scalar = a
        for i in ind:
            a_scalar = a_scalar[i]

        # a_scalar is now a scalar
        res.append(a_scalar)

    return res

This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.

1
  • This is a good example for why you should only use code markup for code and italics for variable names. Commented Dec 16, 2021 at 14:24

2 Answers 2

1

The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.

In [1]: a = np.arange(12).reshape(3,4)                                          
In [2]: t = np.array([(0,0),(1,1),(2,2)])                                       

In [4]: a                                                                       
Out[4]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [5]: t                                                                       
Out[5]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

You tried:

In [6]: a[t]                                                                    
Out[6]: 
array([[[ 0,  1,  2,  3],
        [ 0,  1,  2,  3]],

       [[ 4,  5,  6,  7],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [ 8,  9, 10, 11]]])

So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.

In [10]: a[tuple(t[0])]                # a[(0,0)]                                         
Out[10]: 0
In [11]: a[tuple(t[1])]                # a[(1,1)]                                         
Out[11]: 5
In [12]: a[tuple(t[2])]                                                         
Out[12]: 10

or doing all at once:

In [13]: a[(t[:,0], t[:,1])]                                                      
Out[13]: array([ 0,  5, 10])

Another way to write it, is n lists (or arrays), one for each dimension:

In [14]: a[[0,1,2],[0,1,2]]                                                     
Out[14]: array([ 0,  5, 10])

In [18]: tuple(t.T)                                                             
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]                                                          
Out[19]: array([ 0,  5, 10])

More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].

Using the ideas in [10] and following, your get could be sped up with:

In [20]: def get(a, t): 
    ...:     res = [] 
    ...:     for ind in t: 
    ...:         res.append(a[tuple(ind)])  # index all dimensions at once 
    ...:     return res 
    ...:                                                                        
In [21]: get(a,t)                                                               
Out[21]: [0, 5, 10]

If t really was a list of tuples (as opposed to an array built from them), your get could be:

In [23]: tl = [(0,0),(1,1),(2,2)]                                               
In [24]: [a[ind] for ind in tl]                                                 
Out[24]: [0, 5, 10]
Sign up to request clarification or add additional context in comments.

Comments

0

Explore using np.ravel_multi_index

Create some test data

arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
    t.append( tuple(np.random.randint(10, size = 4)))

print(t)
# [(1, 8, 2, 0),
#  (2, 3, 3, 6),
#  (1, 4, 8, 5),
#  (2, 2, 6, 3),
#  (0, 5, 0, 2),]

ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
#        [8, 3, 4, 2, 5],
#        [2, 3, 8, 6, 0],
#        [0, 6, 5, 3, 2]])

arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263,  502]

np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).

Does this do what you need? Is it fast enough?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.