0

I have two tables on MySQL 5.7 that look like this:

create table places
(
    id int auto_increment primary key,
    position point null comment 'Coordinates of the city.',

    constraint places_position_uindex
        unique (position)
);

create table place_names
(
    id int auto_increment primary key,
    place_id int not null comment 'ID of place in table places.',
    name char(255) not null comment 'Name of the place in the given language.',
    country char(255) not null comment 'Name of the place''s country in the given language.',
    language char(3) not null comment 'ISO 3 code of the language this record is in.'
);

create index place_names_language_index
    on place_names (language);

create index place_names_name_language_index
    on place_names (name, language);

And I'm building a query to fetch a given place's name based on the distance from a given point. I currently have:

SELECT
name,
ST_DISTANCE_SPHERE(position, p.point) AS distance,
administration,
country
FROM place_names
JOIN places ON place_names.place_id = places.id
JOIN (
    SELECT
       POINT(?, ?) AS point
) AS p
WHERE language = 'ENG'
ORDER BY distance
LIMIT 10;

If I EXPLAIN this query I get:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 1 100 Using temporary; Using filesort
1 PRIMARY place_names NULL ref place_names_language_index place_names_language_index 12 const 1368960 100 NULL
1 PRIMARY places NULL eq_ref PRIMARY PRIMARY 4 msdplaces.place_names.place_id 1 100 NULL
2 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used

As you can see, the table is quite large (1368960 rows) and will become much larger in the future. I would like to reduce the lookup rows as much as possible (by limiting them, for example, to a radius of 80 km, or even just 1 lon/lat degree around the given point before computing the ST_DISTANCE_SPHERE between the point and the rows. Or whatever other optimisation could make the query faster, since at the moment it's unusably slow.

All I've found on the internet so far comes from before version 5.7, so it has to manually compute distances instead of using the native POINT datatype and the ST_DISTANCE_SPHERE function-these are much faster than handling trigonometry manually, so I'd like to keep them, but I'm not opposed to splitting the POINT column into separate latitude and longitude if that should bear an advantage.

How can I optimise this query such that table size will impact performance as little as possible?

EDIT: I added a spatial index on position

create spatial index position
    on places (position);

and changed the query to the following to try and make use of the index, but it seems like it's not getting used at all:

explain select
name,
ST_Distance_Sphere(position, p.point) as distance,
administration,
country
FROM place_names
join places on place_names.place_id = places.id
join (
    select
       POINT(30.5315, 56.3396) as point
) as p
WHERE
      MBRContains(ST_GeomFromText('Polygon((29.0 55.0, 29.0 57.0, 31.0 57.0, 29.0 57.0, 29.0 55.0))'), places.position)
and
      language = 'ENG'
order by distance
limit 10;

(Note that in order to add the index I had to make position NOT NULL.) The result:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 1 100 Using where; Using temporary; Using filesort
1 PRIMARY place_names NULL ref place_names_language_index place_names_language_index 12 const 1368960 100 NULL
1 PRIMARY places NULL eq_ref PRIMARY PRIMARY 4 mydb.place_names.place_id 1 100 NULL
2 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used

The result seems the same as without the MBRContains() part of the query and I still see that dreaded "rows = 1368960". As I understand it, that means the rows are not getting restricted by the clause at all. I also tried swapping from and join to have the main table be places, but nothing changes.

10

2 Answers 2

1

Turns out that to solve the issue, what I needed was to:

  1. Make the position column NOT NULL (POINT does not support DEFAULT, so I manually set all null values to POINT(0, 0) and will have to do so when inserting records too). This is a requirement for the index:
  2. ALTER TABLE places ADD SPATIAL INDEX (position).
  3. Use MBRContains() to restrict the query to fewer elements based on position. Of course, MBRWithin() would also work. In actuality I will have to construct the bounding box based on the latitude and longitude manually.

That alone didn't seem to work, but then I found out that the main issue was not on the spatial column but on the join: the place_id column had no index! Whoops.

So this is the final query I ended up with:

SELECT
p.id,
ST_Distance_Sphere(p.position, POINT(30.5315, 56.3396)) AS distance,
pn.name,
pn.administration,
pn.country
FROM (
    SELECT id, position
    FROM places
    WHERE MBRContains(ST_GeomFromText('Polygon((29 55, 29 57, 31 57, 29 57, 29 55))'), position)
) p
JOIN place_names pn ON p.id = pn.place_id
WHERE pn.language = 'ENG'
ORDER BY distance
LIMIT 10;

Thanks to Rick James and Akina for the advice and pointers. Hopefully this will be of help to others passing by.

Sign up to request clarification or add additional context in comments.

Comments

0

What you have must scan all 1368960 points and check the distance to each one. This is time consuming.

All optimizations involve limiting the search to a "bounding box". The following shows a method using a SPATIAL index, plus 4 others.

http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

8 Comments

That method assumes the table stores latitude and longitude separately whereas in my case I have a single position column of type point. I could run that extracting x and y from points but i don't have an index with the results, so that would end up being extremely inefficient, right?
@theberzi - With "Point" you are primed for using a SPATIAL index. See the option for that. It is quite efficient, especially since you have "Points" already. Let me know if you need further guidance.
Yes, I added a spatial index, but I can't seem to make use of it. Please check the EDIT part of my original question.
You might need floats instead of ints in that "from text" string.
And please provide SHOW CREATE TABLE so we can see the SPATIAL index.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.