I have two tables on MySQL 5.7 that look like this:
create table places
(
id int auto_increment primary key,
position point null comment 'Coordinates of the city.',
constraint places_position_uindex
unique (position)
);
create table place_names
(
id int auto_increment primary key,
place_id int not null comment 'ID of place in table places.',
name char(255) not null comment 'Name of the place in the given language.',
country char(255) not null comment 'Name of the place''s country in the given language.',
language char(3) not null comment 'ISO 3 code of the language this record is in.'
);
create index place_names_language_index
on place_names (language);
create index place_names_name_language_index
on place_names (name, language);
And I'm building a query to fetch a given place's name based on the distance from a given point. I currently have:
SELECT
name,
ST_DISTANCE_SPHERE(position, p.point) AS distance,
administration,
country
FROM place_names
JOIN places ON place_names.place_id = places.id
JOIN (
SELECT
POINT(?, ?) AS point
) AS p
WHERE language = 'ENG'
ORDER BY distance
LIMIT 10;
If I EXPLAIN this query I get:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100 | Using temporary; Using filesort |
| 1 | PRIMARY | place_names | NULL | ref | place_names_language_index | place_names_language_index | 12 | const | 1368960 | 100 | NULL |
| 1 | PRIMARY | places | NULL | eq_ref | PRIMARY | PRIMARY | 4 | msdplaces.place_names.place_id | 1 | 100 | NULL |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
As you can see, the table is quite large (1368960 rows) and will become much larger in the future. I would like to reduce the lookup rows as much as possible (by limiting them, for example, to a radius of 80 km, or even just 1 lon/lat degree around the given point before computing the ST_DISTANCE_SPHERE between the point and the rows. Or whatever other optimisation could make the query faster, since at the moment it's unusably slow.
All I've found on the internet so far comes from before version 5.7, so it has to manually compute distances instead of using the native POINT datatype and the ST_DISTANCE_SPHERE function-these are much faster than handling trigonometry manually, so I'd like to keep them, but I'm not opposed to splitting the POINT column into separate latitude and longitude if that should bear an advantage.
How can I optimise this query such that table size will impact performance as little as possible?
EDIT:
I added a spatial index on position
create spatial index position
on places (position);
and changed the query to the following to try and make use of the index, but it seems like it's not getting used at all:
explain select
name,
ST_Distance_Sphere(position, p.point) as distance,
administration,
country
FROM place_names
join places on place_names.place_id = places.id
join (
select
POINT(30.5315, 56.3396) as point
) as p
WHERE
MBRContains(ST_GeomFromText('Polygon((29.0 55.0, 29.0 57.0, 31.0 57.0, 29.0 57.0, 29.0 55.0))'), places.position)
and
language = 'ENG'
order by distance
limit 10;
(Note that in order to add the index I had to make position NOT NULL.) The result:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | place_names | NULL | ref | place_names_language_index | place_names_language_index | 12 | const | 1368960 | 100 | NULL |
| 1 | PRIMARY | places | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.place_names.place_id | 1 | 100 | NULL |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
The result seems the same as without the MBRContains() part of the query and I still see that dreaded "rows = 1368960". As I understand it, that means the rows are not getting restricted by the clause at all. I also tried swapping from and join to have the main table be places, but nothing changes.