The algorithm just builds up a new list from an input data array. It only appends a new element from the input array once the element has crossed the visibleDelta threshold of the previous stored element:
def subsample(data, visibleDelta):
subsampled = [data[0]]
for point in data[1:]:
if abs(point - subsampled[len(subsampled) - 1]) > visibleDelta:
subsampled.append(point)
return subsampled
Problem is I need this to run on very large datasets (~1B values), and I'd like to use numpy or some other numerical library to do this if possible.
I should probably mention that the 'real' function won't just deal with a 1D array of data. The input data will be a pandas dataframe, with the first column being x values, and the second being y values (I'll be comparing the y values).
Any way to do this efficiently?
