2

Elements in arrays x and y are floats. I would like to find elements in array x which have values as close as possible to the ones in array y (for each value in array y - one element in array x). Also array x contains >10^6 elements and array y around 10^3, and this is part of a for loop so it should be done preferably fast.

I tried to avoid making it as a new for loop so I did this, but it is very slow for a big y array

x = np.array([0, 0.2, 1, 2.4, 3,  5]); y = np.array([0, 1, 2]);
diff_xy = x.reshape(1,len(x)) - y.reshape(len(y),1);
diff_xy_abs = np.fabs(diff_xy);
args_x = np.argmin(diff_xy_abs, axis = 1);
x_new = x[args_x]

I'm new to Python, so any suggestion is welcome!

3 Answers 3

2

It come at the cost of the order of x and y, but is that code answer your needs of performance? Rem: the same value from x could be used for more than one value of y.

import numpy as np

# x = np.array([0, 0.2, 1, 2.4, 3,  5]);
# y = np.array([0, 1, 2]);
x = np.random.rand(10**6)*5000000
y = (np.random.rand(10**3)*5000000).astype(int)

x_new = np.zeros(len(y))  # Create an 'empty' array for the result

x.sort()  # could be skipped if already sorted
y.sort()  # could be skipped if already sorted

len_x = len(x)
idx_x = 0
cur_x = x[0]

for idx_y, cur_y in enumerate(y):
    while True:
        if idx_x == len_x-1: 
            # If we are at the end of x, the last value is the best value
            x_new[idx_y] = cur_x
            break
        next_x = x[idx_x+1]
        if abs(cur_y - cur_x) < abs(cur_y - next_x):
            # If the current value of x is better than the next, keep it
            x_new[idx_y] = cur_x
            break
        # Check for the next value
        idx_x += 1
        cur_x = next_x

print(x_new)
Sign up to request clarification or add additional context in comments.

Comments

0

maybe sort the larger array, then binary search the smaller array's values from it, if found that is the closest value and the nearby values are next to it in nearby indexes, if not found, then the closest values are next to the point of failure.

4 Comments

If the 2 arrays are sorted, there is no need to do a binary search. It could be solved in O(n+m), n and m being the size of the 2 arrays. See my code.
ah. I agree, so the idea is to do a merge-sort type of a search where you advance X until you get the best value for current y, then pick the next y , for which all the good values are at the same position or further down x. So the result is n+m maximum.
although for a very large X, my method could be faster. since it is logarithmic in relation to X
mine should be around O(m * log n)
0

The following gives the desired result.

x[abs((np.tile(x, (len(y), 1)).T - y).T).argmin(axis=1)]

It tiles x for each element in y (len(y)), transposes (.T) this tiled array, subtracts y, re-transposes it, takes the absolute value of differences, determines the indexes of the minimum values using argmin (over axis=1), and finally gets the values from these indexes of x.

7 Comments

@Bilja: Fantastic! Glad to hear it. Upvoting is a great way to show other users who may view this question what was helpful, by the way. :)
:) Cannot upvote yet, my reputation is too low (newbie here)
@Bilja: Ah! Welcome to StackOverflow. :)
@Bilja Why do you think this solution is better than mine when mine is at least 1 order of magnitude faster? On my machine 2Cubed solution run in 38 sec and mine in 3 sec.
@2Cubed: Yes, with smaller sets, yours is faster. But with Bilja set sizes (1e6 for x and 1e3 for y), I get Bilja: 54 sec, Yours: Memory error (Python 3.5.2 64b with 16 GB ram), Mine: 4 sec.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.