Is there a way to perform this subsampling algorithm in numpy?

Question

The algorithm just builds up a new list from an input data array. It only appends a new element from the input array once the element has crossed the visibleDelta threshold of the previous stored element:

def subsample(data, visibleDelta):
    subsampled = [data[0]]

    for point in data[1:]:
        if abs(point - subsampled[len(subsampled) - 1]) > visibleDelta:
            subsampled.append(point)

    return subsampled

Problem is I need this to run on very large datasets (~1B values), and I'd like to use numpy or some other numerical library to do this if possible.

I should probably mention that the 'real' function won't just deal with a 1D array of data. The input data will be a pandas dataframe, with the first column being x values, and the second being y values (I'll be comparing the y values).

Any way to do this efficiently?

B. M. · Accepted Answer · 2016-03-06 23:10:19Z

2

if you want to track the data in this way, numpy is not the good tool, See Numba or Cython for efficiency.

A slightly different approach is to determine threshold and look when data reach them :

data=sin(arange(1e6)/3e4)
visibledelta=0.2
cat=floor(data/visibledelta)
subsample=arange(data.size-1)[diff(cat).astype(bool)]
plot(data)
plot(subsample,data[subsample],'o')

which give :

Some adjust may be done, but the data is splitted in chunks.

answered Mar 6, 2016 at 23:10

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Is there a way to perform this subsampling algorithm in numpy?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related