2

Before I start I will say I know this has been asked before, but I've struggled to implement the methods that have been suggested (such as running it via PyPy). This is a last ditch attempt to speed up the code.

Basically I have a piece of code that's around 600 lines long. The bulk of the code takes around 30 seconds to run, but one small section (4 lines long) takes between 5-15 minutes to run. Simple reason for this is it's a mathematical equation in a for-loop, in a for-loop, in a for-loop. So this equation is being calculated in the order of 50 million times. I accept it's going to take a while, but when the same thing is run within MATLAB it's done in under a minute normally. I believe this is because of JIT acceleration; but I might be wrong. Either way this makes me feel there must be a way of speeding this up. The code section is below (the matrices being used are pretty big so I thought I'd just say their dimensions as the numbers within them can vary).

    for k in range(7500):                   
        for jj in range(2):
            for ii in range(k+1):
                 Y[k][jj,0] += S[ii][jj] * c[k-ii][jj,jj] * U[ii][jj,jj]

Where the size of the matrices (/arrays) are:

numpy.shape(Y) = (7500, 2, 2)
numpy.shape(S) = (7500, 2, 1)
numpy.shape(c) = (7500, 2, 2)
numpy.shape(U) = (7500, 2, 2)

Does anyone see anything I could do to speed this up?

Edit 1:

As requested here's the MATLAB version of the above:

for k=1:7500
    for j=1:2
       for i=1:7500

           Y(j,1,k)=Y(j,1,k)+S(j,1,i)*c(j,j,k+1-i)*U(j,j,i);

       end
    end
end

Edit 2:

Should have added, I'm using 3.4.2

Also, sadly I don't have the source maths behind the code. I have it for around 2/3 of the code, but not the latter third. I just have the MATLAB code to convert. (For now at least)

9
  • 4
    Can you explain what the computation is, outside of the Python code? With MATLAB, numpy, etc., it's often more efficient to use the built-in matrix/array processing functions that a for loop an iterating manually. For instance, something like c[k-ii][jj,jj] * U[ii][jj,jj] could possibly be something like tmp = reverse(c) * U. (I'm just approximating the notation, but hopefully the idea is clear.) Actually, since ii goes up to k+1, it will be a bit more complicated, but something with a cumulative sum or cumulative product may help. What's the corresponding MATLAB code? Commented Jan 27, 2015 at 16:46
  • I dont think you need the loops .... but I cant tell what your doing enough to say for sure ... maybe just Y = S*c*U .... maybe Commented Jan 27, 2015 at 16:47
  • @JoranBeasley It will be a bit more complicated than that, on account of ii ranging from 0 to k+1. There's a kind of cumulative summation going on. Commented Jan 27, 2015 at 16:59
  • I've added the MATLAB version as an edit to my OP. This is what I've had to base my Python code off of (rather than basing off a raw mathematical equation for example). Commented Jan 27, 2015 at 17:05
  • 1
    If this is Python 2.x, then range() might be a lot slower than (the laxy, 3.x-like) xrange(). Commented Jan 27, 2015 at 17:05

1 Answer 1

2

The result can be obtained using np.convolve.

import numpy as np

S = np.random.rand(1000, 2, 1)
c = np.random.rand(1000, 2, 2)
U = np.random.rand(1000, 2, 2)

Y = np.zeros_like(U)
for k in range(1000):
    for jj in range(2):
        for ii in range(k+1):
            Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

Yx = np.zeros_like(Y)
for jj in range(2):
    Yx[:,jj,0] += np.convolve(S[:,jj,0] * U[:,jj,jj], c[:,jj,jj], mode='full')[:Yx.shape[0]]

print(abs(Y - Yx).max())
# -> 3.12638803734e-13

How to find this? Notice that things are just multiplied together along the jj axis, and that the ii summation is actually a convolution. Then it's just a matter of fiddling the indices correctly in the numpy function.

If you want additional speed, subsituting convolve with scipy.signal.fftconvolve may speed it up even more. Some timings:

for loops:         77 s
np.convolve:       33.6 ms
fftconvolve:       1.48 ms

This gives a nice ~ 50000x speedup.

Note also that you should always write Y[k,jj,0] and not Y[k][jj,0] -- since there is no JIT, the latter creates a temporary array view, which is going to cost you if you evaluate the expression a large number of times. Rewriting the line in your for loop expression as

Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

speeds up the evaluation already by a factor of 4 (!).

Sign up to request clarification or add additional context in comments.

2 Comments

This has solved it perfectly. Got the total run time from 677 seconds down to 26 seconds! And based on the tip at the end I reckon I can go back and trim some time off of other places. I'd be surprised if I couldn't get the whole thing to under 15 seconds! Thank you very much!
In cases where the array operations are not easily rewritten as fast primitives as here, you could just rewrite it using Cython (which has the advantage of requiring minimal code reorganization when moving from Python). In this case, it would have required extra definitions def compute(double[:,:,:] Y, double[:,:,:] S, double[:,:,:] c, double[:,:,:] U): cdef int k, ii, jj but the for loops could be kept as is.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.