3

I like to least-squares match data (a numpy array of floats) with many known signal shapes. My code works but is too slow for the many runs I plan to do:

import numpy
import time

samples = 50000
width_signal = 100
data = numpy.random.normal(0, 1, samples)
signal = numpy.random.normal(0, 1, width_signal)  # Placeholder

t0 = time.clock()
for i in range(samples - width_signal):
    data_chunk = data[i:i + width_signal]
    residuals = data_chunk - signal
    squared_residuals = residuals**2
    summed_residuals = numpy.sum(squared_residuals)
t1 = time.clock()
print('Time elapsed (sec)', t1-t0)

EDIT: Corrected a mistake: First square residuals, then sum them.

This takes about 0.2 sec to run on my machine. As I have many datasets and signal shapes, this is too slow. My specific problem does not allow for typical MCMC methods because the signal shapes are too different. It has to be brute force.

Typical volumes are 50,000 floats for the data and 100 for the signal. These can vary by a factor of a few.

My tests show that:

  • The summing of the data numpy.sum(residuals) eats 90% of the time. I tried Python's sum(residuals) and it is faster for small arrays (~<50 elements) and slower for bigger arrays. Should I insert an if condition?
  • I tried numpy.roll() instead of fetching data directly, and .roll() is slower.

Questions:

  • Is there a logical improvement for speed-up?
  • Is there a faster way to sum arrays? I know no C, but if it is much faster I could try that.
  • Can a GPU help? I have many runs to do. If so, where could I find a code snippet to do this?
9
  • You may have some luck with pandas.rolling but then the overhead of creating the DFs might start to dominate Commented Aug 24, 2018 at 10:20
  • Did you consider using a convolution instead of the sum of squared differences? Also, does the square have to be inside the sum? Commented Aug 24, 2018 at 10:33
  • Ah of course, I made a mistake: The squaring of the residuals must come first, then the summing of these squared residuals. I changed it in the Q. Is it possible to do this with np.lib.stride_tricks.as_strided? Thanks so much. Commented Aug 24, 2018 at 10:51
  • Should be for i in range(samples - width_signal + 1): to cover all. Commented Aug 24, 2018 at 10:55
  • Regarding the +1 in length: In practice, I will have to roll over the edges. The real data is phase-folded, so the signal can begin near the end of the data and continue at the beginning. Commented Aug 24, 2018 at 11:06

2 Answers 2

6

Based on the various methods proposed in Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy, we look to solve our case here.

Approach #1

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows and thus have our first solution here, like so -

from skimage.util import view_as_windows

d = view_as_windows(data,(width_signal))-signal # diffs
out = np.einsum('ij,ij->i',d,d)

More info on use of as_strided based view_as_windows.

Approach #2

Again based on the matrix-multiplication trick in that answer post, we could better the performance, like so -

def MSD_strided(data, signal):
    w = view_as_windows(data,(width_signal))
    return (w**2).sum(1) + (signal**2).sum(0) - 2*w.dot(signal)

Approach #3

We will improve on approach#2 by bringing on uniform filtering and convolution -

from scipy.ndimage.filters import uniform_filter 

def MSD_uniffilt_conv(data, signal):
    hW = width_signal//2
    l = len(data)-len(signal)+1
    parte1 = uniform_filter(data**2,width_signal)[hW:hW+l]*width_signal
    parte3 = np.convolve(data, signal[::-1],'valid')    
    return parte1 + (signal**2).sum(0) - 2*parte3

Benchmarking

Timings on posted sample -

In [117]: %%timeit
     ...: for i in range(samples - width_signal + 1):
     ...:     data_chunk = data[i:i + width_signal]
     ...:     residuals = data_chunk - signal
     ...:     squared_residuals = residuals**2
     ...:     summed_residuals = numpy.sum(squared_residuals)
1 loop, best of 3: 239 ms per loop

In [118]: %%timeit
     ...: d = view_as_windows(data,(width_signal))-signal
     ...: np.einsum('ij,ij->i',d,d)
100 loops, best of 3: 11.1 ms per loop

In [209]: %timeit MSD_strided(data, signal)
10 loops, best of 3: 18.4 ms per loop

In [210]: %timeit MSD_uniffilt_conv(data, signal)
1000 loops, best of 3: 1.71 ms per loop

~140x speedup there with the third one!

Sign up to request clarification or add additional context in comments.

4 Comments

I did quite a bit of searching for a moving window in numpy and this approach didn't appear anywhere. Ok, it's not part of numpy but it's a lot better than what I was finding. Should I have been searching for something other than "numpy moving window" or should it be added as an answer to those threads?
@roganjosh You would find a lot of np.lib.stride_tricks.as_strided around on SO. scikit-image's view_as_windows is a recent addition to scikit-image, so that might be why it's not around much, but I find it pretty neat as compared to tinkering around with the strides and shapes arg for as_strided one. Hence, I have been using this view_as_windows quite regularly whenever I need to work with sliding windows. As for the term - "NumPy windows" could be good keywords.
Yep, I did see np.lib.stride_tricks.as_strided actually, but I agree that this is a neat way for it being packaged up :)
. . . Wow. That last approach in your edit is pretty amazing.
3

Apart from the versions given by Divakar you could simply use a compiler like Numba or Cython.

Exmaple

import numba as nb
@nb.njit(fastmath=True,parallel=True)
def sq_residuals(data,signal):
  summed_residuals=np.empty(data.shape[0]+1-signal.shape[0],dtype=data.dtype)
  for i in nb.prange(data.shape[0] - signal.shape[0]+1):
      sum=0.
      for j in range(signal.shape[0]):
        sum+=(data[i+j]-signal[j])**2
      summed_residuals[i]=sum
  return summed_residuals

Timings

Numba 0.4dev, Python 3.6, Quadcore i7
MSD_uniffilt_conv(Divakar): 2.4ms

after the first call which invokes some compilation overhead:
sq_residuals              : 1.7ms

1 Comment

Thanks! This is a useful alternative for cases where Numba/Cython can be used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.