I have three numpy arrays where one column is the time stamp (Unix time to the millisecond as an integer) and the other column is a reading from a sensor (integer). Each of these three arrays occurs simultaneously in time (ie, the span of the time column is roughly the same), however they are sampled at different frequencies (one is 500 Hz, others 125 Hz). The final array should be (n,4) with columns [time, array1,array2,array3].
500.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505325032, 196],
[1463505325034, 197],
[1463505325036, 197],
[1463505325038, 195]])
125.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505287912, -5796],
[1463505287920, -5858],
[1463505287928, -5920],
[1463505287936, -5968]])
Currently, my initial plan has been as follows but performance isn't amazing:
- Find the earliest start time (b/c of different frequencies and system issues, they do not exactly all start at the same millisecond)
- Create a new array with a time column that starts at the earliest time and runs as long as the longest of the three arrays. Fill the time column to the desired common frequency using
np.linspace/np.arange - Loop over the three arrays, using
np.interpor similar to convert to common frequency, and then stack the output onto the common numpy array created above
I have tens of thousands of these intervals and they can be multiple days long, so hoping for something that is reasonably quick and memory efficient. Thank you!
NaN/nulls everywhere else.