0

I am playing with the KDQuery function in SciPy.Spatial. I have an issue once my data sizes get really large. I realize that the algorithm is not necessarily designed to be efficient for large datasets, but it appears (from the source) that size should only increase processing time, not impact output.

Here is a code snippet:

sizes = [ 10**i for i in range(5,6) ] #10^5 for this test
data = np.random.random_integers(0,100,(sizes[-1],2))
for size in sizes:
    kd = ps.common.KDTree(data)
    nnq = kd.query(data,k=2+1, p=2)
    info = nnq[1] #This is the indices of the neighbors
    neighbors = {}
    idset = np.arange(len(info)) #Indices of the input point
    for i, row in enumerate(info):
        row = row.tolist()
        row.remove(i)
        neighbors[idset[i]] = list(row)

This returns a value error when i is not in the list (ValueError list.remove(x): x not in list). For data sizes less than 10^5 this code works as expected.

One potential reason for the error is that the recursion limit is being reached. To explore this I set the recursion depth to 1,000,000 (sys.setrecursionlimit(1000000)). This does not alleviate the problem.

2
  • What is the ps.common namespace? Commented Mar 19, 2013 at 15:49
  • Just another file that handles the module level imports Commented Mar 19, 2013 at 21:12

1 Answer 1

1

The error occurs in your code, at the statement row.remove(i). The problem is that your random data set can have duplicate points, and sometimes the same point can be repeated more than three times. This becomes very likely when the data set is large. When that happens, the three nearest neighbors of a point might not include the point itself. That causes the error in row.remove(i).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.