Optimisation of Nearest Neighbour Algorithm

Question

Background:
Some time ago I asked a question here regarding the most efficient method for finding the neighbours of every point in a set of data by comparing each point against every other. I've attempted to optimise the process by breaking the coordinate list into separate sets based on their x coordinates, and grouping them in overlapping sets. The neighbour finder 'pairs_3' is then used on each set. This then gives me a full set of connections between vertices. The code below is a snippet of what I have so far:

def pairs_3(coordinates, vertex_spacing): #Function from previous question
    i=-1
    results_2 = []
    results_2 [:]= []
    for (xc, yc, zc) in enumerate(coordinates):
        for (xb, yb, zb) in enumerate(coordinates[ic:]):
            dx = xb - xc 
            if sqrt(dx**2) > a:
                    break
            dy = yb - yc
            dz = zb - zc

            if (sqrt(dx**2+dy**2+dz**2)) == vertex_spacing:
                i+=1
                results_2.append([(xc, yc, zc), (xb, yb, zb), width, dx, dy, dz, edge])
    return results_2

Coordinates_xyz=sorted(set(zip(Coord_x,Coord_y,Coord_z))) #Full 3D coordinate list
Coord_x_set=sorted(set(Coord_x))
numerical_width = Coord_x.count(Coord_x_set[1]) #gets number of columns
N=numerical_width
Col_subList = [Coordinates_xyz[n:n+2*N] for n in range(0,         len(Coordinates_xyz), N)] #Col_subList is a list of grouped data according to x coordinate. The data is always discrete, so this is possible.

new_p_2=[]
new_p_2[:]=[]
new_p_3=[]
new_p_3[:]=[]
#where a is vertex separation, a known value used to generate the data
for x in Col_subList:
   new_p_2.append(pairs_3(x, a))

for x in new_p_2:
    for y in x:
        new_p_3.append(y)

p_2=new_p_3
start=column(p_2,0) #Start Vertex
end=column(p_2, 1) #End Vertex
s_i=[] #Start Vertex Index
s_i[:]=[] 
c_i=[] #Connection Index
c_i[:]=[]
for i in start:
indices = [ind for ind, x in enumerate(Coordinates_xyz_final) if x == i]
for j in indices:
    s_i.append(j)

for i in xrange(len(start)):
c_i.append(i)
p_2 = zip(start, end, s_i, c_i) #Final form necessary for later calculations

Question:
Is there any way to further/better optimise this code? And are there any ways to further break down the problem that aren't more computationally expensive than the function itself?

Bonus Question:
Is there a way to efficiently integrate a 'divide and conquer' into the function itself? I.e break the list into sets based on columns inside the function?

pentadecagon · Accepted Answer · 2013-11-17 14:33:53Z

1

What you are looking for is called R-Tree. If you just want it as a tool to solve a problem I suggest you go for one of several existing implementations, Google turns up e.g. https://pypi.python.org/pypi/Rtree/.

answered Nov 17, 2013 at 14:33

pentadecagon

4,8972 gold badges21 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Luyi Tian · Accepted Answer · 2013-11-16 15:47:21Z

0

There are two ways without optimizing the algo:

use pypy if you script contains nested for loops or while loop
in this case the algo is simple, contains a lot for loops, contains a lot arithmetical operation, then it would be a good idea to write this in C. The simpliest way is to use SWIG.

also you could use scipy/numpy for math.

answered Nov 16, 2013 at 15:47

Luyi Tian

3711 silver badge12 bronze badges

Collectives™ on Stack Overflow

Optimisation of Nearest Neighbour Algorithm

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related