averaging over subsets of array in numpy

Question

I have a numpy array of the shape (10, 10, 10, 60). The dimensions could be arbitrary but this just an example.

I want to reduce this to an array of (10, 10, 10, 20) by taking the mean over some subsets I have two scenarios:

1: Take the mean of every (10, 10, 10, 20) block i.e. have three (10, 10, 10, 20) block and take the mean between the three. This can be done with: m = np.mean((x[..., :20], x[..., 20:40], x[...,40:60]), axis=3). My question is how can I generate this when the last dimension is arbitrary without writing some explicit loop? So, I can do something like:

x = np.random.rand(10, 10, 10, 60)
result = np.zeros((10, 10, 10, 20))
offset = 20
loops = x.shape[3] // offset
for i in range(loops):
    index = i * offset
    result += x[..., index:index+offset]
result = result / loops

However, this does not seem too pythonic and I was wondering if there is a more elegant way to do this.

2: Another scenario is that I want to break it down into 10 arrays of the shape (10, 10, 10, 2, 3) and then take the mean along the 5th dimension between these ten arrays and then reshape this to (10, 10, 10, 20) array as original planned. I can reshape the array and then again take the average as done previously and reshape again but that second part seems quite inelegant.

For part2 : Could you elaborate on the breaking down part? Or give us a working loopy implementation? — Divakar
– Divakar, Commented Jan 26, 2017 at 13:41
Actually your solution would work for both the parts, it seems! I just need to reshape differently, I think. Let me try! — Luca
– Luca, Commented Jan 26, 2017 at 13:42

Divakar · Accepted Answer · 2017-01-26 13:34:25Z

1

You could reshape splitting the last axis into two, such that the first one has the length as the number of blocks needed and then get the average/mean along the second last axis -

m,n,r = x.shape[:3]
out = x.reshape(m,n,r,3,-1).mean(axis=-2) # 3 is no. of blocks

Alternatively, we could introduce np.einsum for noticeable performance boost -

In [200]: x = np.random.rand(10, 10, 10, 60)

In [201]: %timeit x.reshape(m,n,r,3,-1).mean(axis=-2)
1000 loops, best of 3: 430 µs per loop

In [202]: %timeit np.einsum('ijklm->ijkm',x.reshape(m,n,r,3,-1))/3.0
1000 loops, best of 3: 214 µs per loop

edited Jan 26, 2017 at 13:34

answered Jan 26, 2017 at 13:28

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

averaging over subsets of array in numpy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related