Python: how to find elements in array x which have values close to elements in array y?

Question

Elements in arrays x and y are floats. I would like to find elements in array x which have values as close as possible to the ones in array y (for each value in array y - one element in array x). Also array x contains >10^6 elements and array y around 10^3, and this is part of a for loop so it should be done preferably fast.

I tried to avoid making it as a new for loop so I did this, but it is very slow for a big y array

x = np.array([0, 0.2, 1, 2.4, 3,  5]); y = np.array([0, 1, 2]);
diff_xy = x.reshape(1,len(x)) - y.reshape(len(y),1);
diff_xy_abs = np.fabs(diff_xy);
args_x = np.argmin(diff_xy_abs, axis = 1);
x_new = x[args_x]

I'm new to Python, so any suggestion is welcome!

Cabu · Accepted Answer · 2016-08-19 08:40:32Z

It come at the cost of the order of x and y, but is that code answer your needs of performance? Rem: the same value from x could be used for more than one value of y.

import numpy as np

# x = np.array([0, 0.2, 1, 2.4, 3,  5]);
# y = np.array([0, 1, 2]);
x = np.random.rand(10**6)*5000000
y = (np.random.rand(10**3)*5000000).astype(int)

x_new = np.zeros(len(y))  # Create an 'empty' array for the result

x.sort()  # could be skipped if already sorted
y.sort()  # could be skipped if already sorted

len_x = len(x)
idx_x = 0
cur_x = x[0]

for idx_y, cur_y in enumerate(y):
    while True:
        if idx_x == len_x-1: 
            # If we are at the end of x, the last value is the best value
            x_new[idx_y] = cur_x
            break
        next_x = x[idx_x+1]
        if abs(cur_y - cur_x) < abs(cur_y - next_x):
            # If the current value of x is better than the next, keep it
            x_new[idx_y] = cur_x
            break
        # Check for the next value
        idx_x += 1
        cur_x = next_x

print(x_new)

Markus Mikkolainen · Accepted Answer · 2016-08-18 12:30:03Z

0

maybe sort the larger array, then binary search the smaller array's values from it, if found that is the closest value and the nearby values are next to it in nearby indexes, if not found, then the closest values are next to the point of failure.

answered Aug 18, 2016 at 12:30

Markus Mikkolainen

3,50720 silver badges21 bronze badges

4 Comments

Cabu Over a year ago

If the 2 arrays are sorted, there is no need to do a binary search. It could be solved in O(n+m), n and m being the size of the 2 arrays. See my code.

Markus Mikkolainen Over a year ago

ah. I agree, so the idea is to do a merge-sort type of a search where you advance X until you get the best value for current y, then pick the next y , for which all the good values are at the same position or further down x. So the result is n+m maximum.

Markus Mikkolainen Over a year ago

although for a very large X, my method could be faster. since it is logarithmic in relation to X

Markus Mikkolainen Over a year ago

mine should be around O(m * log n)

2Cubed · Accepted Answer · 2016-08-18 13:10:28Z

0

The following gives the desired result.

x[abs((np.tile(x, (len(y), 1)).T - y).T).argmin(axis=1)]

It tiles x for each element in y (len(y)), transposes (.T) this tiled array, subtracts y, re-transposes it, takes the absolute value of differences, determines the indexes of the minimum values using argmin (over axis=1), and finally gets the values from these indexes of x.

answered Aug 18, 2016 at 13:10

2Cubed

3,6017 gold badges26 silver badges42 bronze badges

7 Comments

2Cubed Over a year ago

@Bilja: Fantastic! Glad to hear it. Upvoting is a great way to show other users who may view this question what was helpful, by the way. :)

Bilja Over a year ago

:) Cannot upvote yet, my reputation is too low (newbie here)

2Cubed Over a year ago

@Bilja: Ah! Welcome to StackOverflow. :)

Cabu Over a year ago

@Bilja Why do you think this solution is better than mine when mine is at least 1 order of magnitude faster? On my machine 2Cubed solution run in 38 sec and mine in 3 sec.

Cabu Over a year ago

@2Cubed: Yes, with smaller sets, yours is faster. But with Bilja set sizes (1e6 for x and 1e3 for y), I get Bilja: 54 sec, Yours: Memory error (Python 3.5.2 64b with 16 GB ram), Mine: 4 sec.

|

Collectives™ on Stack Overflow

Python: how to find elements in array x which have values close to elements in array y?

3 Answers 3

Comments

4 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related