MySQL Spatial Join on Closest Point

Question

I've looked around a bit and found quite a few people seeking to order a table of points by distance to a set point, but I'm curious how one would go about efficiently joining two tables on the minimum distance between two points. In my case, consider the table nodes and centroids.

CREATE TABLE nodes (
    node_id VARCHAR(255),
    pt POINT
);
CREATE TABLE centroids (
    centroid_id MEDIUMINT UNSIGNED,
    temperature FLOAT,
    pt POINT
);

I have approximately 300k nodes and 15k centroids, and I want to get the closest centroid to each node so I can assign each node a temperature. So far I have created spatial indexes on pt on both tables and tried running the following query:

SELECT
    nodes.node_id,
    MIN(ST_DISTANCE(nodes.pt, centroids.pt))
FROM nodes
INNER JOIN centroids
ON ST_DISTANCE(nodes.pt, centroids.pt) <= 4810
GROUP BY
    nodes.node_id
LIMIT 10;

Clearly, this query is not going to solve my problem; it does not retrieve temperature, assumes that the closest centroid is within 4810, and only evaluates 10 nodes. However, even with these simplifications, this query is very poorly optimized, and is still running as I type this. When I have MySQL give details about the query, it says no indexes are being used and none of the spatial indexes are listed as possible keys.

How could I build a query that can actually return the data I want joined efficiently utilizing spatial indexes?

karmakaze · Accepted Answer · 2019-12-30 18:47:32Z

2

I think a good approach would be partitioning (numerically not db partitioning) the data into cells. I don't know how well spatial indexes applies here, but the high-level logic is to say bin each node and centroid point into square regions and find matches between all the node-centroid in the same square, then make sure that there isn't a closer match in an 8-adjacent square (e.g. using the same nodes in original square). The closest matches can then be used to compute and save the temperature. All subsequent queries should ignore nodes with the temperature set.

There will still be nodes with centroids that aren't within the same or 8-adjacent squares, you would then expand the search, perhaps use squares with double the width and height. I can see this working with plain indexes on just the x and y coordinate of the points. I don't know how spatial indexes can further improve this.

answered Dec 30, 2019 at 18:47

karmakaze

36.3k1 gold badge34 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Benjamin Brownlee Over a year ago

You mentioning cells brought up an idea. I know that MySQL spatial indexes can quickly find items within or surrounding a certain polygons. Could it be possible to convert my centroids to tessellation of polygons and then just search for the nodes that each polygon contains? Would this be much faster than trying to compute distance?

karmakaze Over a year ago

That sounds really cool, you know that all the nodes in a polygon will be closest to one of the vertex centroids. So now the problem becomes making a tessellation. How about just triangles. Remembering that the 'outside' (negative polygon of the mesh) is also a polygon for those nodes that aren't inside.

karmakaze Over a year ago

This doesn't actually work. since in `\/_` a point inside the right triangle very near the left side could be closer to the top-left point on the left triangle than any of the points in the right triangle. Maybe better to stick with squares.

Benjamin Brownlee Over a year ago

What about these? I could pull all the centroids and calculate their Voronoi polygons using python and push them back. A spatial index on those polygons should be fast. en.wikipedia.org/wiki/Voronoi_diagram

Benjamin Brownlee Over a year ago

Yes, the calculation of Voronoi polygons and pairing of temperatures to nodes using these regions was surprisingly fast. I can do the join I want on my entire network in just a couple seconds. I will accept your answer for your help here and explain what I did in my own answer to my question.

|

GMB · Accepted Answer · 2019-12-30 19:15:56Z

1

There are many ways to solve this least-n-per-group problem.

One method uses a self-left-join antipattern (this allows ties):

select 
    n.node_id,
    c.centroid_id,
    st_distance(n.pt, c.pt) dist,
    c.temperature
from nodes n
cross join centroids c
left join centroids c1 
    on c1.centroid_id <> c.centroid_id
    and st_distance(n.pt, c1.pt) < st_distance(n.pt, c.pt)
where c1.centroid_id is null

The same logic can be expressed with a not exists condition.

Another option is to use a correlated subquery for filtering (this does not allow ties):

select 
    n.node_id,
    n.node_id,
    c.centroid_id,
    st_distance(n.pt, c.pt) dist,
    c.temperature
from nodes n
inner join centroids c
    on c.centroid_id = (
        select c1.centroid_id
        from centroids c1
        order by st_distance(n.pt, c1.pt) 
        limit 1
    )

Finally: if all you want is the temperature of the closest centroid, then a simple subquery should be a good choice:

select 
    n.node_id,
    (
        select c1.temperature
        from centroids c1
        order by st_distance(n.pt, c1.pt) 
        limit 1
    ) temperature 
from nodes n

edited Dec 30, 2019 at 19:15

answered Dec 30, 2019 at 18:42

GMB

224k25 gold badges103 silver badges151 bronze badges

1 Comment

Benjamin Brownlee Over a year ago

Thanks for the working queries. They are definitely runnable with low limits, but MySQL still doesn't seem to be using any indexes. 100 nodes took about 2.14s, 1000 took 21.33s, and 10000 took 212.94s. So a full network (500k nodes once I update it) should take about 3hrs, which definitely manageable, but it would great to optimize it further. Any ideas how to actually utilize mysql spatial indexes, or is this not what they are designed for?

Collectives™ on Stack Overflow

MySQL Spatial Join on Closest Point

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related