Optimising spatial mysql query with point

Question

I have two tables on MySQL 5.7 that look like this:

create table places
(
    id int auto_increment primary key,
    position point null comment 'Coordinates of the city.',

    constraint places_position_uindex
        unique (position)
);

create table place_names
(
    id int auto_increment primary key,
    place_id int not null comment 'ID of place in table places.',
    name char(255) not null comment 'Name of the place in the given language.',
    country char(255) not null comment 'Name of the place''s country in the given language.',
    language char(3) not null comment 'ISO 3 code of the language this record is in.'
);

create index place_names_language_index
    on place_names (language);

create index place_names_name_language_index
    on place_names (name, language);

And I'm building a query to fetch a given place's name based on the distance from a given point. I currently have:

SELECT
name,
ST_DISTANCE_SPHERE(position, p.point) AS distance,
administration,
country
FROM place_names
JOIN places ON place_names.place_id = places.id
JOIN (
    SELECT
       POINT(?, ?) AS point
) AS p
WHERE language = 'ENG'
ORDER BY distance
LIMIT 10;

If I EXPLAIN this query I get:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived2>	NULL	ALL	NULL	NULL	NULL	NULL	1	100	Using temporary; Using filesort
1	PRIMARY	place_names	NULL	ref	place_names_language_index	place_names_language_index	12	const	1368960	100	NULL
1	PRIMARY	places	NULL	eq_ref	PRIMARY	PRIMARY	4	msdplaces.place_names.place_id	1	100	NULL
2	DERIVED	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	No tables used

As you can see, the table is quite large (1368960 rows) and will become much larger in the future. I would like to reduce the lookup rows as much as possible (by limiting them, for example, to a radius of 80 km, or even just 1 lon/lat degree around the given point before computing the ST_DISTANCE_SPHERE between the point and the rows. Or whatever other optimisation could make the query faster, since at the moment it's unusably slow.

All I've found on the internet so far comes from before version 5.7, so it has to manually compute distances instead of using the native POINT datatype and the ST_DISTANCE_SPHERE function-these are much faster than handling trigonometry manually, so I'd like to keep them, but I'm not opposed to splitting the POINT column into separate latitude and longitude if that should bear an advantage.

How can I optimise this query such that table size will impact performance as little as possible?

EDIT: I added a spatial index on position

create spatial index position
    on places (position);

and changed the query to the following to try and make use of the index, but it seems like it's not getting used at all:

explain select
name,
ST_Distance_Sphere(position, p.point) as distance,
administration,
country
FROM place_names
join places on place_names.place_id = places.id
join (
    select
       POINT(30.5315, 56.3396) as point
) as p
WHERE
      MBRContains(ST_GeomFromText('Polygon((29.0 55.0, 29.0 57.0, 31.0 57.0, 29.0 57.0, 29.0 55.0))'), places.position)
and
      language = 'ENG'
order by distance
limit 10;

(Note that in order to add the index I had to make position NOT NULL.) The result:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived2>	NULL	ALL	NULL	NULL	NULL	NULL	1	100	Using where; Using temporary; Using filesort
1	PRIMARY	place_names	NULL	ref	place_names_language_index	place_names_language_index	12	const	1368960	100	NULL
1	PRIMARY	places	NULL	eq_ref	PRIMARY	PRIMARY	4	mydb.place_names.place_id	1	100	NULL
2	DERIVED	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	No tables used

The result seems the same as without the MBRContains() part of the query and I still see that dreaded "rows = 1368960". As I understand it, that means the rows are not getting restricted by the clause at all. I also tried swapping from and join to have the main table be places, but nothing changes.

You may create spatial index and use MBRWithin() for pre-filtering. dev.mysql.com/doc/refman/5.7/en/using-spatial-indexes.html and dev.mysql.com/doc/refman/5.7/en/… — Akina
– Akina, Commented May 21, 2021 at 7:57
Alternatively you may divide your map to 80 km squares, pre-calculate square number for each point, and pre-filter by the point posess in the same or adjacent (by side or corner) square. — Akina
– Akina, Commented May 21, 2021 at 8:00
"find nearest": mysql.rjweb.org/doc.php/find_nearest_in_mysql — Rick James
– Rick James, Commented May 21, 2021 at 18:07
Use a spatial index and MBRContains. That will avoid the nasty table scan. Check out this dba.stackexchange.com/questions/260757/… — O. Jones
– O. Jones, Commented May 21, 2021 at 22:47
Latitude lines get closer together as you get nearer the poles. So Cartesian distance calculations become inaccurate. Here's an explanation. stackoverflow.com/questions/67318013/… — O. Jones
– O. Jones, Commented May 25, 2021 at 12:49

theberzi · Accepted Answer · 2021-05-28 08:03:05Z

Turns out that to solve the issue, what I needed was to:

Make the position column NOT NULL (POINT does not support DEFAULT, so I manually set all null values to POINT(0, 0) and will have to do so when inserting records too). This is a requirement for the index:
ALTER TABLE places ADD SPATIAL INDEX (position).
Use MBRContains() to restrict the query to fewer elements based on position. Of course, MBRWithin() would also work. In actuality I will have to construct the bounding box based on the latitude and longitude manually.

That alone didn't seem to work, but then I found out that the main issue was not on the spatial column but on the join: the place_id column had no index! Whoops.

So this is the final query I ended up with:

SELECT
p.id,
ST_Distance_Sphere(p.position, POINT(30.5315, 56.3396)) AS distance,
pn.name,
pn.administration,
pn.country
FROM (
    SELECT id, position
    FROM places
    WHERE MBRContains(ST_GeomFromText('Polygon((29 55, 29 57, 31 57, 29 57, 29 55))'), position)
) p
JOIN place_names pn ON p.id = pn.place_id
WHERE pn.language = 'ENG'
ORDER BY distance
LIMIT 10;

Thanks to Rick James and Akina for the advice and pointers. Hopefully this will be of help to others passing by.

Rick James · Accepted Answer · 2021-05-21 18:13:37Z

0

What you have must scan all 1368960 points and check the distance to each one. This is time consuming.

All optimizations involve limiting the search to a "bounding box". The following shows a method using a SPATIAL index, plus 4 others.

http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

answered May 21, 2021 at 18:13

Rick James

144k15 gold badges144 silver badges255 bronze badges

8 Comments

theberzi Over a year ago

That method assumes the table stores latitude and longitude separately whereas in my case I have a single position column of type point. I could run that extracting x and y from points but i don't have an index with the results, so that would end up being extremely inefficient, right?

Rick James Over a year ago

@theberzi - With "Point" you are primed for using a SPATIAL index. See the option for that. It is quite efficient, especially since you have "Points" already. Let me know if you need further guidance.

theberzi Over a year ago

Yes, I added a spatial index, but I can't seem to make use of it. Please check the EDIT part of my original question.

Rick James Over a year ago

You might need floats instead of ints in that "from text" string.

Rick James Over a year ago

And please provide SHOW CREATE TABLE so we can see the SPATIAL index.

|

Collectives™ on Stack Overflow

Optimising spatial mysql query with point

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related