0

The following section of my code is taking ages to run (it's the only loop in the function, so it's the most likely culprit):

tree = KDTree(x_rest)
for i in range(len(x_lost)):
    _, idx = tree.query([x_lost[i]], k=int(np.sqrt(len(x_rest))), p=1)
    y_lost[i] = mode(y_rest[idx][0])[0][0]

Is there a way to speed this up? I have a few suggestions from Stack Overflow:

1
  • I can't believe I didn't think of that. So would calling tree.query(x_lost, k=...) give me an array of idx? Perhaps the last line could use a list comprehension then. Commented Feb 5, 2023 at 0:15

1 Answer 1

2

Here are a few notes about how you could speed this up:

  1. This code loops over x_rest, and calls tree.query() with one point from x_rest at a time. However, query() supports querying multiple points at once. The loop inside query() is implemented in Cython, so I would expect it to be much faster than a loop written in Python. If you call it like this, it will return an array of matches.

  2. The query() function supports a parameter called workers, which if set to a value larger than one, runs your query in parallel. Since workers is implemented using threads, it will likely be faster than a solution using multiprocessing.Pool, since it avoids pickling. See the documentation.

  3. The code above doesn't define the mode() function, but I'm assuming it's scipy.stats.mode(). If that's the case, rather than calling mode() repeatedly, you can use the axis argument, which would let you take the mode of nearby points for multiple queries at once.

Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I did! I rewrote the function in Cython, moved the query to a batch call, and used scipy.stats.mode with axis=1. I couldn't set workers sadly, since the computer cluster I'm using is on an older version of scipy, but doing this got me a 50x speedup, which was enough.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.