Before I start I will say I know this has been asked before, but I've struggled to implement the methods that have been suggested (such as running it via PyPy). This is a last ditch attempt to speed up the code.
Basically I have a piece of code that's around 600 lines long. The bulk of the code takes around 30 seconds to run, but one small section (4 lines long) takes between 5-15 minutes to run. Simple reason for this is it's a mathematical equation in a for-loop, in a for-loop, in a for-loop. So this equation is being calculated in the order of 50 million times. I accept it's going to take a while, but when the same thing is run within MATLAB it's done in under a minute normally. I believe this is because of JIT acceleration; but I might be wrong. Either way this makes me feel there must be a way of speeding this up. The code section is below (the matrices being used are pretty big so I thought I'd just say their dimensions as the numbers within them can vary).
for k in range(7500):
for jj in range(2):
for ii in range(k+1):
Y[k][jj,0] += S[ii][jj] * c[k-ii][jj,jj] * U[ii][jj,jj]
Where the size of the matrices (/arrays) are:
numpy.shape(Y) = (7500, 2, 2)
numpy.shape(S) = (7500, 2, 1)
numpy.shape(c) = (7500, 2, 2)
numpy.shape(U) = (7500, 2, 2)
Does anyone see anything I could do to speed this up?
Edit 1:
As requested here's the MATLAB version of the above:
for k=1:7500
for j=1:2
for i=1:7500
Y(j,1,k)=Y(j,1,k)+S(j,1,i)*c(j,j,k+1-i)*U(j,j,i);
end
end
end
Edit 2:
Should have added, I'm using 3.4.2
Also, sadly I don't have the source maths behind the code. I have it for around 2/3 of the code, but not the latter third. I just have the MATLAB code to convert. (For now at least)
c[k-ii][jj,jj] * U[ii][jj,jj]could possibly be something liketmp = reverse(c) * U. (I'm just approximating the notation, but hopefully the idea is clear.) Actually, since ii goes up to k+1, it will be a bit more complicated, but something with a cumulative sum or cumulative product may help. What's the corresponding MATLAB code?Y = S*c*U.... mayberange()might be a lot slower than (the laxy, 3.x-like)xrange().