4

I have a 3d Numpy array and would like to take the mean over one axis considering certain elements from the other two dimensions.

This is an example code depicting my problem:

import numpy as np
myarray = np.random.random((5,10,30))
yy = [1,2,3,4]
xx = [20,21,22,23,24,25,26,27,28,29]
mymean = [ np.mean(myarray[t,yy,xx]) for t in np.arange(5) ]

However, this results in:

ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why does an indexing like e.g. myarray[:,[1,2,3,4],[1,2,3,4]] work, but not my code above?

3 Answers 3

5

This is how you fancy-index over more than one dimension:

>>> np.mean(myarray[np.arange(5)[:, None, None], np.array(yy)[:, None], xx],
            axis=(-1, -2))
array([ 0.49482768,  0.53013301,  0.4485054 ,  0.49516017,  0.47034123])

When you use fancy indexing, i.e. a list or array as an index, over more than one dimension, numpy broadcasts those arrays to a common shape, and uses them to index the array. You need to add those extra dimensions of length 1 at the end of the first indexing arrays, for the broadcast to work properly. Here are the rules of the game.

Sign up to request clarification or add additional context in comments.

3 Comments

Even on second glance, this is just bewildering to me. Anyhow, it works and I thank you a lot for the explanation!
It is confusing, but if you think about it long enough, you'll eventually see that it is the syntax that makes more sense and gives a more consistent behavior, especially when considering multidimensional indexing arrays. They could have implemented a special case when all indexing arrays are 1D, but as the Zen of Python states "Special cases aren't special enough to break the rules."
In case myarray is a masked arrary, I get: TypeError: tuple indices must be integers, not tuple. So masked arrays behave differently?
3

Since you use consecutive elements you can use a slice:

import numpy as np
myarray = np.random.random((5,10,30))
yy = slice(1,5)
xx = slice(20, 30)
mymean = [np.mean(myarray[t, yy, xx]) for t in np.arange(5)]

4 Comments

That is how you fix the problem, but why doesnt the example work?
Thank you! In case of no consecutive elements, what is the solution?
The problem is that broadcasting rules has to be followed in case of the integer array subindexing. If you want non-consecutive, use boolean indexing.
@Ophion docs If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised:
3

To answer your question about why it doesn't work: when you use lists/arrays as indices, Numpy uses a different set of indexing semantics than it does if you use slices. You can see the full story in the documentation and, as that page says, it "can be somewhat mind-boggling".

If you want to do it for nonconsecutive elements, you must grok that complex indexing mechanism.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.