Timings
Both sessions were initialized with
In [1]: C = np.random.rand(20,1600,500)
In [2]: d = np.random.randint(0, 20, size=900)
In [3]: e = np.random.randint(1600, size=900)
In [4]: f = np.random.randint(400, size=900)
In [5]: b = np.random.randint(100, size=50)
In [6]: g = np.random.rand(50)
Numpy 1.9.0
In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 942 µs per loop
In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 762 µs per loop
In [9]: %%timeit
...: a = 0
...: for i in range(len(b)):
...: a += np.mean(C[d, e, f + b[i]]) * g[i]
...:
100 loops, best of 3: 2.25 ms per loop
In [10]: np.__version__
Out[10]: '1.9.0'
In [11]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
e[:,np.newaxis],
f[:,np.newaxis] + b), dims=C.shape)]
.mean(axis=0).dot(g))
....:
1000 loops, best of 3: 940 µs per loop
Numpy 1.8.2
In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
100 loops, best of 3: 2.81 ms per loop
In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
100 loops, best of 3: 2.7 ms per loop
In [9]: %%timeit
...: a = 0
...: for i in range(len(b)):
...: a += np.mean(C[d, e, f + b[i]]) * g[i]
...:
100 loops, best of 3: 4.12 ms per loop
In [10]: np.__version__
Out[10]: '1.8.2'
In [51]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
e[:,np.newaxis],
f[:,np.newaxis] + b), dims=C.shape)]
.mean(axis=0).dot(g))
....:
1000 loops, best of 3: 1.4 ms per loop
Description
You can use coordinate broadcasting trick to flesh out your 50x900 array from the beginning:
In [158]: C[d,e,f + b[:, np.newaxis]].shape
Out[158]: (50, 900)
From that point, mean and dot will get you to the destination:
In [159]: C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
Out[159]: 13.582349962518611
In [160]:
a = 0
for i in range(len(b)):
a += np.mean(C[d, e, f + b[i]]) * g[i]
print(a)
.....:
13.5823499625
And it's about 3.3x faster than the loop version:
In [161]: %timeit C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 585 µs per loop
In [162]: %%timeit
a = 0
for i in range(len(b)):
a += np.mean(C[d, e, f + b[i]]) * g[i]
.....:
1000 loops, best of 3: 1.95 ms per loop
The array is of significant size, so you must factor in CPU cache. I cannot say I know how np.sum traverses the array, but in 2d arrays there is always a slightly better way (when the next element you pick is adjacent memory-wise) and a slightly worse way (when the next element is found across the stride). Let's see if we can win something more by transposing the array during indexing:
In [196]: C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
Out[196]: 13.582349962518608
In [197]: %timeit C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 461 µs per loop
That's 4.2x faster than the loop.
C, what isd,e,f,gandb? What do you actually want to compute? You sum over means,... why? Please give a bit of context, so that it becomes possible to see what optimisation is possible.for i, item in enumerate(b):