Most efficient and/or simplest method of interpolating multiple numpy timeseries arrays into one array?

Question

I have three numpy arrays where one column is the time stamp (Unix time to the millisecond as an integer) and the other column is a reading from a sensor (integer). Each of these three arrays occurs simultaneously in time (ie, the span of the time column is roughly the same), however they are sampled at different frequencies (one is 500 Hz, others 125 Hz). The final array should be (n,4) with columns [time, array1,array2,array3].

500.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505325032,           196],
       [1463505325034,           197],
       [1463505325036,           197],
       [1463505325038,           195]])

125.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505287912,         -5796],
       [1463505287920,         -5858],
       [1463505287928,         -5920],
       [1463505287936,         -5968]])

Currently, my initial plan has been as follows but performance isn't amazing:

Find the earliest start time (b/c of different frequencies and system issues, they do not exactly all start at the same millisecond)
Create a new array with a time column that starts at the earliest time and runs as long as the longest of the three arrays. Fill the time column to the desired common frequency using np.linspace/np.arange
Loop over the three arrays, using np.interp or similar to convert to common frequency, and then stack the output onto the common numpy array created above

I have tens of thousands of these intervals and they can be multiple days long, so hoping for something that is reasonably quick and memory efficient. Thank you!

Whoops, clarified. That's just the head/first few values. They go on for a number of minutes and do overlap. I don't think it really changes the question much however. I'd still expect them to be able to join, it's just that there would only be data in parts of the combined array where there is overlap with lots of NaN/nulls everywhere else. — user3290553
– user3290553, Commented Jul 13, 2020 at 18:54
I would say pursue using Pandas it has nice datetime functionality - specifically resampling. You would need to convert to a datime first - Pandas converting row with unix timestamp (in milliseconds) to datetime. Without more of an minimal reproducible example it might be hard to come up with a more time-efficient process. — wwii
– wwii, Commented Jul 13, 2020 at 19:26

Han-Kwang Nienhuys · Accepted Answer · 2020-07-14 13:49:43Z

You'll have to interpolate the 125 Hz signal to get 500 Hz. It depends on what quality of the interpolation you need. For linear interpolation, scipy.signal.interp1d in linear-interpolation mode is a bit slow, O(log n) for n data points, because it does a bisection search for every evaluation. The calculation time explodes if you ask it to do a smooth interpolation on a large dataset, because that involves solving a system of 3n equations with 3n unknowns.

If your sampling rates have an integer ratio (1:4 in your example), you can do linear interpolation more efficiently like this:

# interpolate a125 to a500
n = len(a125)
a500 = np.zeros((n-1)*4+1)
a500[0::4] = a125
a500[1::4] = 0.75*a125[:-1] + 0.25*a125[1:]
a500[2::4] = 0.5*a125[:-1] + 0.5*a125[1:]
a500[3::4] = 0.25*a125[:-1] + 0.75*a125[1:]

If you need smooth interpolation, use scipy.signal.resample. This being a Fourier method will require careful handling of the end points of your time series; you need to pad it with data that makes a gradual transition from the end point back to the start point:

from scipy.signal import resample
m = n//8
padding = np.linspace(a125[-1], a125[0], m)
a125_pad = np.concatenate([a125, padding])
a500b = resample(a125_pad, (n+m)*4)[:4*n]

Depending on the nature of your data, it might be better to have a continuous derivative at the end points.

Note that the FFT that is used for the resampling likes to have an array size that is a product of small prime numbers (2, 3, 5, 7). Choose the padding size (m) wisely.

Collectives™ on Stack Overflow

Most efficient and/or simplest method of interpolating multiple numpy timeseries arrays into one array?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related