Struggling With Python For-Loop Speed

Question

Before I start I will say I know this has been asked before, but I've struggled to implement the methods that have been suggested (such as running it via PyPy). This is a last ditch attempt to speed up the code.

Basically I have a piece of code that's around 600 lines long. The bulk of the code takes around 30 seconds to run, but one small section (4 lines long) takes between 5-15 minutes to run. Simple reason for this is it's a mathematical equation in a for-loop, in a for-loop, in a for-loop. So this equation is being calculated in the order of 50 million times. I accept it's going to take a while, but when the same thing is run within MATLAB it's done in under a minute normally. I believe this is because of JIT acceleration; but I might be wrong. Either way this makes me feel there must be a way of speeding this up. The code section is below (the matrices being used are pretty big so I thought I'd just say their dimensions as the numbers within them can vary).

    for k in range(7500):                   
        for jj in range(2):
            for ii in range(k+1):
                 Y[k][jj,0] += S[ii][jj] * c[k-ii][jj,jj] * U[ii][jj,jj]

Where the size of the matrices (/arrays) are:

numpy.shape(Y) = (7500, 2, 2)
numpy.shape(S) = (7500, 2, 1)
numpy.shape(c) = (7500, 2, 2)
numpy.shape(U) = (7500, 2, 2)

Does anyone see anything I could do to speed this up?

Edit 1:

As requested here's the MATLAB version of the above:

for k=1:7500
    for j=1:2
       for i=1:7500

           Y(j,1,k)=Y(j,1,k)+S(j,1,i)*c(j,j,k+1-i)*U(j,j,i);

       end
    end
end

Edit 2:

Should have added, I'm using 3.4.2

Also, sadly I don't have the source maths behind the code. I have it for around 2/3 of the code, but not the latter third. I just have the MATLAB code to convert. (For now at least)

Can you explain what the computation is, outside of the Python code? With MATLAB, numpy, etc., it's often more efficient to use the built-in matrix/array processing functions that a for loop an iterating manually. For instance, something like c[k-ii][jj,jj] * U[ii][jj,jj] could possibly be something like tmp = reverse(c) * U. (I'm just approximating the notation, but hopefully the idea is clear.) Actually, since ii goes up to k+1, it will be a bit more complicated, but something with a cumulative sum or cumulative product may help. What's the corresponding MATLAB code? — Joshua Taylor
– Joshua Taylor, Commented Jan 27, 2015 at 16:46
I dont think you need the loops .... but I cant tell what your doing enough to say for sure ... maybe just Y = S*c*U .... maybe — Joran Beasley
– Joran Beasley, Commented Jan 27, 2015 at 16:47
@JoranBeasley It will be a bit more complicated than that, on account of ii ranging from 0 to k+1. There's a kind of cumulative summation going on. — Joshua Taylor
– Joshua Taylor, Commented Jan 27, 2015 at 16:59
I've added the MATLAB version as an edit to my OP. This is what I've had to base my Python code off of (rather than basing off a raw mathematical equation for example). — Steve
– Steve, Commented Jan 27, 2015 at 17:05
If this is Python 2.x, then range() might be a lot slower than (the laxy, 3.x-like) xrange(). — unwind
– unwind, Commented Jan 27, 2015 at 17:05

pv. · Accepted Answer · 2015-01-27 17:35:53Z

2

The result can be obtained using np.convolve.

import numpy as np

S = np.random.rand(1000, 2, 1)
c = np.random.rand(1000, 2, 2)
U = np.random.rand(1000, 2, 2)

Y = np.zeros_like(U)
for k in range(1000):
    for jj in range(2):
        for ii in range(k+1):
            Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

Yx = np.zeros_like(Y)
for jj in range(2):
    Yx[:,jj,0] += np.convolve(S[:,jj,0] * U[:,jj,jj], c[:,jj,jj], mode='full')[:Yx.shape[0]]

print(abs(Y - Yx).max())
# -> 3.12638803734e-13

How to find this? Notice that things are just multiplied together along the jj axis, and that the ii summation is actually a convolution. Then it's just a matter of fiddling the indices correctly in the numpy function.

If you want additional speed, subsituting convolve with scipy.signal.fftconvolve may speed it up even more. Some timings:

for loops:         77 s
np.convolve:       33.6 ms
fftconvolve:       1.48 ms

This gives a nice ~ 50000x speedup.

Note also that you should always write Y[k,jj,0] and not Y[k][jj,0] -- since there is no JIT, the latter creates a temporary array view, which is going to cost you if you evaluate the expression a large number of times. Rewriting the line in your for loop expression as

Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

speeds up the evaluation already by a factor of 4 (!).

edited Jan 27, 2015 at 17:35

answered Jan 27, 2015 at 17:12

pv.

35.4k9 gold badges62 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Steve Over a year ago

This has solved it perfectly. Got the total run time from 677 seconds down to 26 seconds! And based on the tip at the end I reckon I can go back and trim some time off of other places. I'd be surprised if I couldn't get the whole thing to under 15 seconds! Thank you very much!

pv. Over a year ago

In cases where the array operations are not easily rewritten as fast primitives as here, you could just rewrite it using Cython (which has the advantage of requiring minimal code reorganization when moving from Python). In this case, it would have required extra definitions def compute(double[:,:,:] Y, double[:,:,:] S, double[:,:,:] c, double[:,:,:] U): cdef int k, ii, jj but the for loops could be kept as is.

Collectives™ on Stack Overflow

Struggling With Python For-Loop Speed

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related