How to increase performance of for loop in python

Question

I'm doing a lot of simulations with python, simulating system responses.

I've currently been using a Runge-Kutta scheme, but have stumbled upon another scheme I've been testing.

When testing this in Matlab I achieve exceptional performance, compared to that of my Runge-Kutta. However, when I transferred this to Python, it was significantly slower.

I'm not sure if this is just how it is, or if I could improve my way of coding, so I would love to hear some of your input, if possible.

The code in Matlab, exemplified:

dt = 0.0001;
f = randn(1, (60 / dt));
ns = length(f);
yo = zeros(3,1);
P1 = [0; 0.001; 0];
F = [1 0.0001 0; 0.001 1 0.0001; 0.001 0 1];
y1 = zeros(3, ns);
tic
for i = 1:ns
    y1(:, i) = P1*f(:, i) + F*yo;
    yo = y1(:, i);
end
toc

In which the loop executes in 0.55-0.61 sec.

The code in Python, exemplified:

dt = 0.0001
f = np.random.randn(1, int(60 / dt))
ns = np.size(f)
yo = np.zeros((3))
F = np.array([[1, 0.0001, 0], [0.001, 1, 0.0001], [0.001, 0, 1]])
P1 = np.transpose(np.array([[0, 0.0001, 0]]))
y1 = np.zeros((3, ns), order='F')
start_time = time.time()
for i in range(ns-1):
    y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
    yo = y1[: , i]
print("--- %s seconds ---" % (time.time() - start_time))

In which the loop executes in 2.8 -3.1 sec.

Can I do something to improve this?

Thanks for considering my question.

I looked at your code earlier in an earlier version, and I didn't see how it could be refactored (in particular, vectorized). Have you considered using a dedicated differential equation (ODE) solver from scipy.integrate? — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Jan 16, 2019 at 20:45
@DillonDavis some context is missing but it's not complete code. You have more rep than me on CR but anyway: CR asks that we don't habitually send questions there from SO. Performance optimization questions are still on topic here. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Jan 16, 2019 at 20:55
I'm new to CR myself. After reading that link, I agree with you @AndrasDeak. — Dillon Davis
– Dillon Davis, Commented Jan 16, 2019 at 21:00
@AndrasDeak, I have attempted to use scipy.signal.lsim, however, this appears to be more robust for my usage, and In most situations still faster. — Martin Skovmand
– Martin Skovmand, Commented Jan 16, 2019 at 21:06

PMende · Accepted Answer · 2019-01-16 21:05:15Z

2

I suggested using numba in the comments. Here is an example:

import numba
import numpy as np

def py_func(dt, F, P1):
    f = np.random.randn(1, int(60 / dt))
    ns = f.size
    yo = np.zeros((3))
    y1 = np.zeros((3, ns), order='F')
    for i in range(ns-1):
        y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
        yo = y1[: , i]
    return yo

@numba.jit(nopython=True)
def numba_func(dt, F, P1):
    f = np.random.randn(1, int(60 / dt))
    ns = f.size
    yo = np.zeros((3))
    y1 = np.zeros((3, ns))
    for i in range(ns-1):
        y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
        yo = y1[: , i]
    return yo

You can't use 'F' order with numba since it uses C-type arrays, not FORTRAN arrays.

The timing differences are shown below:

Pure python loop:

%%timeit
py_func(dt, F, P1)

Results:

2.88 s ± 100 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba:

%%timeit
numba_func(dt, F, P1)

Results:

588 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Jan 16, 2019 at 21:05

PMende

5,4903 gold badges21 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andras Deak -- Слава Україні Over a year ago

Note that fortran order should only be a matter of convenience, one can just permute the arrays and have "fast" axes last in the python version. And numba is probably pretty close in performance to what MATLAB's JIT compiler does.

Martin Skovmand Over a year ago

I've installed Numba, and that seems to do nicely. Thanks a lot, I will get right on to implement it in my code!

Jonas B. · Accepted Answer · 2019-01-16 21:54:08Z

1

I optimized your code a bit, the execution time for me went from 2.8s down to around 1.2s. Before you look for faster interpreters I recommend you do profiling (see line_profiler) and try to remove everything you can out of the innermost loop. Better, try to avoid any explicit 'for' loops and rely on numpy functions such as dot, einsum, etc.

I guess there is still some place for optimization. I don't think I changed your values but better check. With other tools like numba or cython (cython.org) or pypy (pypy.org) I guess your execution time will improve quite a lot more.

#!/usr/bin/env python3

import numpy as np
import time

np.random.seed(0)

#@profile
def run():
    dt = 0.0001
    f = np.random.randn(1, int(60 / dt))
    ns = np.size(f)
    yo = np.zeros((3))
    F = np.array([[1, 0.0001, 0], [0.001, 1, 0.0001], [0.001, 0, 1]])
    P1 = np.transpose(np.array([[0, 0.0001, 0]]))
    start_time = time.time()
    y1 = np.outer(f, P1)
    for i in range(ns-1):
        y1[i] += F@yo
        yo = y1[i]
    print("--- %s seconds ---" % (time.time() - start_time))
    y1 = y1.T
    print(yo)

run()

edited Jan 16, 2019 at 21:54

answered Jan 16, 2019 at 21:47

Jonas B.

1725 bronze badges

2 Comments

Martin Skovmand Over a year ago

This is actually really good. I tried implementing this with Numba, to achieve even faster computation times, but I dont think the "@" operater, or maybe the np.outer() function is available, I end up with errors. So for now I'm sticking with the Numba solution.

Martin Skovmand Over a year ago

I would like to revoke my answer, saying that this was the most stable solution, after more than just the initial testing. Sometimes Numba achieved exceptional times, other times, it didn't. This is very consistent in the timing. So ended up being my preferable choice.

Collectives™ on Stack Overflow

How to increase performance of for loop in python

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related