1

I'm doing a lot of simulations with python, simulating system responses.

I've currently been using a Runge-Kutta scheme, but have stumbled upon another scheme I've been testing.

When testing this in Matlab I achieve exceptional performance, compared to that of my Runge-Kutta. However, when I transferred this to Python, it was significantly slower.

I'm not sure if this is just how it is, or if I could improve my way of coding, so I would love to hear some of your input, if possible.

The code in Matlab, exemplified:

dt = 0.0001;
f = randn(1, (60 / dt));
ns = length(f);
yo = zeros(3,1);
P1 = [0; 0.001; 0];
F = [1 0.0001 0; 0.001 1 0.0001; 0.001 0 1];
y1 = zeros(3, ns);
tic
for i = 1:ns
    y1(:, i) = P1*f(:, i) + F*yo;
    yo = y1(:, i);
end
toc

In which the loop executes in 0.55-0.61 sec.

The code in Python, exemplified:

dt = 0.0001
f = np.random.randn(1, int(60 / dt))
ns = np.size(f)
yo = np.zeros((3))
F = np.array([[1, 0.0001, 0], [0.001, 1, 0.0001], [0.001, 0, 1]])
P1 = np.transpose(np.array([[0, 0.0001, 0]]))
y1 = np.zeros((3, ns), order='F')
start_time = time.time()
for i in range(ns-1):
    y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
    yo = y1[: , i]
print("--- %s seconds ---" % (time.time() - start_time))

In which the loop executes in 2.8 -3.1 sec.

Can I do something to improve this?

Thanks for considering my question.

8
  • I looked at your code earlier in an earlier version, and I didn't see how it could be refactored (in particular, vectorized). Have you considered using a dedicated differential equation (ODE) solver from scipy.integrate? Commented Jan 16, 2019 at 20:45
  • 1
    You might look into numba. Commented Jan 16, 2019 at 20:51
  • 1
    @DillonDavis some context is missing but it's not complete code. You have more rep than me on CR but anyway: CR asks that we don't habitually send questions there from SO. Performance optimization questions are still on topic here. Commented Jan 16, 2019 at 20:55
  • 1
    I'm new to CR myself. After reading that link, I agree with you @AndrasDeak. Commented Jan 16, 2019 at 21:00
  • 1
    @AndrasDeak, I have attempted to use scipy.signal.lsim, however, this appears to be more robust for my usage, and In most situations still faster. Commented Jan 16, 2019 at 21:06

2 Answers 2

2

I suggested using numba in the comments. Here is an example:

import numba
import numpy as np

def py_func(dt, F, P1):
    f = np.random.randn(1, int(60 / dt))
    ns = f.size
    yo = np.zeros((3))
    y1 = np.zeros((3, ns), order='F')
    for i in range(ns-1):
        y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
        yo = y1[: , i]
    return yo

@numba.jit(nopython=True)
def numba_func(dt, F, P1):
    f = np.random.randn(1, int(60 / dt))
    ns = f.size
    yo = np.zeros((3))
    y1 = np.zeros((3, ns))
    for i in range(ns-1):
        y1[:, i] = np.dot(P1, f[:, i]) + np.reshape(np.dot(F, yo), (3))
        yo = y1[: , i]
    return yo

You can't use 'F' order with numba since it uses C-type arrays, not FORTRAN arrays.

The timing differences are shown below:

Pure python loop:

%%timeit
py_func(dt, F, P1)

Results:

2.88 s ± 100 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba:

%%timeit
numba_func(dt, F, P1)

Results:

588 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

2 Comments

Note that fortran order should only be a matter of convenience, one can just permute the arrays and have "fast" axes last in the python version. And numba is probably pretty close in performance to what MATLAB's JIT compiler does.
I've installed Numba, and that seems to do nicely. Thanks a lot, I will get right on to implement it in my code!
1

I optimized your code a bit, the execution time for me went from 2.8s down to around 1.2s. Before you look for faster interpreters I recommend you do profiling (see line_profiler) and try to remove everything you can out of the innermost loop. Better, try to avoid any explicit 'for' loops and rely on numpy functions such as dot, einsum, etc.

I guess there is still some place for optimization. I don't think I changed your values but better check. With other tools like numba or cython (cython.org) or pypy (pypy.org) I guess your execution time will improve quite a lot more.

#!/usr/bin/env python3

import numpy as np
import time

np.random.seed(0)

#@profile
def run():
    dt = 0.0001
    f = np.random.randn(1, int(60 / dt))
    ns = np.size(f)
    yo = np.zeros((3))
    F = np.array([[1, 0.0001, 0], [0.001, 1, 0.0001], [0.001, 0, 1]])
    P1 = np.transpose(np.array([[0, 0.0001, 0]]))
    start_time = time.time()
    y1 = np.outer(f, P1)
    for i in range(ns-1):
        y1[i] += F@yo
        yo = y1[i]
    print("--- %s seconds ---" % (time.time() - start_time))
    y1 = y1.T
    print(yo)

run()

2 Comments

This is actually really good. I tried implementing this with Numba, to achieve even faster computation times, but I dont think the "@" operater, or maybe the np.outer() function is available, I end up with errors. So for now I'm sticking with the Numba solution.
I would like to revoke my answer, saying that this was the most stable solution, after more than just the initial testing. Sometimes Numba achieved exceptional times, other times, it didn't. This is very consistent in the timing. So ended up being my preferable choice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.