Finding those elements in an array which are "close"

Question

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.

A naive approach would to be to make N^2 comparisons doing something like

diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)

Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies]. This works for smaller arrays, but not arrays of the size with which I work.

An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.

Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?

Here is a small example:

x = np.array([ 9, 12, 
           21, 
           36, 39, 44, 46, 47, 
           58, 
           64, 65,])

the result should look like

array([[ 0,  1],
       [ 3,  4],
       [ 5,  6],
       [ 5,  7],
       [ 6,  7],
       [ 9, 10]], dtype=int64)

You could have 2 indices - left and right. Left would be responsible for a smaller element, right - for a larger. So, once you find all the valid rights for current left, you could increment left and continue with the right. It's a O(n) approach and you can do it because the array is sorted.. — Anatolii
– Anatolii, Commented Jun 12, 2018 at 19:30
@user1055395 it can't be O(n) and correct at the same time because if all elements are close just enumerating all pairs is already O(n^2) — Paul Panzer
– Paul Panzer, Commented Jun 12, 2018 at 20:25
@PaulPanzer at a time when I added a comment I thought the duplicates were not allowed. After I learned they're allowed I agree that O(N^2) is the worst case here. — Anatolii
– Anatolii, Commented Jun 12, 2018 at 20:30
Due to the nature of the problem, I know there are lots of gaps larger than 5, so while O(N^2) is worst case, I am hopeful that it will be much faster than that. — Demetri Pananos
– Demetri Pananos, Commented Jun 12, 2018 at 20:32

Paul Panzer · Accepted Answer · 2018-06-12 21:54:54Z

Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:

import numpy as np

def f_pp(A, maxgap):
    d0 = np.diff(A)
    d = d0.copy()
    IDX = []
    k = 1
    idx, = np.where(d <= maxgap)
    vidx = idx[d[idx] > 0]
    while vidx.size:
        IDX.append(vidx[:, None] + (0, k))
        if idx[-1] + k + 1 == A.size:
            idx = idx[:-1]
        d[idx] = d[idx] + d0[idx+k]
        k += 1
        idx = idx[d[idx] <= maxgap]
        vidx = idx[d[idx] > 0]
    return np.concatenate(IDX, axis=0)

data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))

pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))

from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')

Sample run:

99963 points 1020651 close pairs
pp 43.00256529124454 ms

anishtain4 · Accepted Answer · 2018-06-12 21:48:50Z

0

Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:

d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)

Here pieces is a list, where you can then use in a loop with your own code on every element.

The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:

pairs=[]
for i in range(x.size):
    j=i+1
    while x[j]-x[i]<=5 and j<x.size:
        pairs.append([i,j])
        j+=1

If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.

answered Jun 12, 2018 at 21:48

anishtain4

2,4102 gold badges18 silver badges22 bronze badges

Collectives™ on Stack Overflow

Finding those elements in an array which are "close"

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related