2

How can this next for-loop get a speedup with numpy? I guess some fancy indexing-trick can be used here, but i have no idea which one (can einsum be used here?).

a=0
for i in range(len(b)):
    a+=numpy.mean(C[d,e,f+b[i]])*g[i]

edit: C is a numpy 3D array of shape comparable to (20, 1600, 500). d,e,f are indices of points that are "interesting" (lengths of d,e,f are the same and around 900) b and g have the same length (around 50). The mean is taken over all the points in C with the indices d,e,f+b[i]

4
  • 4
    It would help to have a MCVE. What is C, what is d, e, f, g and b? What do you actually want to compute? You sum over means,... why? Please give a bit of context, so that it becomes possible to see what optimisation is possible. Commented Oct 16, 2014 at 12:06
  • 1
    @Unapiedra updated a bit Commented Oct 16, 2014 at 12:11
  • 1
    Side note, you can use enumerate in your for loop to get indexes: for i, item in enumerate(b): Commented Oct 16, 2014 at 14:11
  • @DiZou Ah yeah, i forgot about that - thanks Commented Oct 16, 2014 at 14:27

4 Answers 4

3

You can do the following trick:

C[d, e][:, np.add.outer(f, b)].dot(g).diagonal().mean()

improving even more, by prematurely taking the elements that will form the diagonal:

C[d, e][np.arange(len(d))[:, None], np.add.outer(f, b)].dot(g).mean()
Sign up to request clarification or add additional context in comments.

7 Comments

Your second version is two times slower than my version, but i noticed in the added info, that i swapped two values - so maybe thats the case (now i re-updated with the right values). Your type of solution is the kind of thing i am looking for though
@usethedeathstar and what about the first version? does it perform faster? I had the impression that the second version would scale better, but didn't test it though...
The first version is about 300times slower
@usethedeathstar which Numpy version are you using. I am on 1.9.0 and the second solution is faster than yours, because they improved the fancy indexing performance very significantly...
Does anyone know how much the speed difference is for this construction between 1.8.0 and 1.9.0?
|
1

Timings

Both sessions were initialized with

In [1]: C = np.random.rand(20,1600,500)

In [2]: d = np.random.randint(0, 20, size=900)

In [3]: e = np.random.randint(1600, size=900)

In [4]: f = np.random.randint(400, size=900)

In [5]: b = np.random.randint(100, size=50)

In [6]: g = np.random.rand(50)

Numpy 1.9.0

In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 942 µs per loop

In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 762 µs per loop

In [9]: %%timeit                                               
   ...: a = 0
   ...: for i in range(len(b)):                                     
   ...:     a += np.mean(C[d, e, f + b[i]]) * g[i]
   ...: 
100 loops, best of 3: 2.25 ms per loop

In [10]: np.__version__
Out[10]: '1.9.0'

In [11]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
                                 e[:,np.newaxis],
                                 f[:,np.newaxis] + b), dims=C.shape)]
 .mean(axis=0).dot(g))
   ....: 
1000 loops, best of 3: 940 µs per loop

Numpy 1.8.2

In [7]: %timeit C[d,e,f + b[:,np.newaxis]].mean(axis=1).dot(g)
100 loops, best of 3: 2.81 ms per loop

In [8]: %timeit C[d[:,np.newaxis],e[:, np.newaxis],f[:, np.newaxis] + b].mean(axis=0).dot(g)
100 loops, best of 3: 2.7 ms per loop

In [9]: %%timeit                                               
   ...: a = 0
   ...: for i in range(len(b)):                                     
   ...:     a += np.mean(C[d, e, f + b[i]]) * g[i]
   ...: 
100 loops, best of 3: 4.12 ms per loop

In [10]: np.__version__
Out[10]: '1.8.2'

In [51]: %%timeit
(C.ravel()[np.ravel_multi_index((d[:,np.newaxis],
                                 e[:,np.newaxis],
                                 f[:,np.newaxis] + b), dims=C.shape)]
 .mean(axis=0).dot(g))
   ....: 
1000 loops, best of 3: 1.4 ms per loop

Description

You can use coordinate broadcasting trick to flesh out your 50x900 array from the beginning:

In [158]: C[d,e,f + b[:, np.newaxis]].shape
Out[158]: (50, 900)

From that point, mean and dot will get you to the destination:

In [159]: C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
Out[159]: 13.582349962518611

In [160]: 
a = 0
for i in range(len(b)):       
    a += np.mean(C[d, e, f + b[i]]) * g[i]
print(a)
   .....: 
13.5823499625

And it's about 3.3x faster than the loop version:

In [161]: %timeit C[d,e,f + b[:, np.newaxis]].mean(axis=1).dot(g)
1000 loops, best of 3: 585 µs per loop

In [162]: %%timeit                                               
a = 0
for i in range(len(b)):                                     
    a += np.mean(C[d, e, f + b[i]]) * g[i]
   .....: 
1000 loops, best of 3: 1.95 ms per loop

The array is of significant size, so you must factor in CPU cache. I cannot say I know how np.sum traverses the array, but in 2d arrays there is always a slightly better way (when the next element you pick is adjacent memory-wise) and a slightly worse way (when the next element is found across the stride). Let's see if we can win something more by transposing the array during indexing:

In [196]: C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
Out[196]: 13.582349962518608

In [197]: %timeit C[d[:,np.newaxis], e[:,np.newaxis], f[:,np.newaxis] + b].mean(axis=0).dot(g)
1000 loops, best of 3: 461 µs per loop

That's 4.2x faster than the loop.

5 Comments

Is your solution with numpy 1.9.0 or is it faster in 1.8.0 also?
Interesting, np1.8 is indeed significantly slower than 1.9. I've updated the answer with 1.8.2 timings.
Curiouser, there's a "1.9ms" constant slowdown for each of the three compared variants.
It appears there was a significant pessimization of multi-dim integer lookups, same lookup but performed with ravelled indexers is significantly faster on numpy-1.8 (see the updated benchmarks)
changed to take this as my accepted answer, since this shows great benchmarking of several solutions, both for 1.9 and 1.8
1

It's quite similar to the loopy version:

np.sum(np.mean(C[d,e,f+b[:,None]], axis=1) * g)

And you can combine the summation and multiplication into a dot product:

C[d,e,f+b[:,None]].mean(1).dot(g)

But for the timing that doesn't seem to matter; The indexing operation is by far the most time consuming operation of all (at least on Numpy 1.8.0). Compared to that, the loop overhead in the original code is insignificant.

Comments

0

The only speed you could hope for in structural terms would be with the following code:

#Initialize a 4-D array
aggregated = numpy.zeros((len(d), len(e), len(f), len(b)))
#Populate it by the shifted copies of C
for i in range(len(b)):
    aggregated[:, :, :, i] = C[d, e, f + b[i]]

#Compute the mean on the first three axes
means = numpy.mean(aggregated, axis=(0, 1, 2))
#Multiply term-by-term by g (be careful that means and g have the same size!) and sum
a = numpy.sum(means * g)

However this does not guarantee that the calculation will be faster, it may even be slower for the following reasons:

  • Populating the 4-D array has a non-negligible cost since it copies memory
  • b is very small, so you won't win much anyway. If b is bigger this can become interesting, as long as d, e, f get smaller too

In any case you should benchmark both solutions. You could also try using something like Cython to perform the for loop, but this seems like an overkill.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.