I'm working in simplifying zip codes polygons in a MYSQL database (v 8.0), I'm reducing the number of coordinates for each polygon.
So, I have a table named zip_city, which contains a column named boundary, which is the original multipolygon column, and I created another one with the simplified polygons, boundary_simplified. Both of them have SRID 4326 (I've included the is_point column because it might be important):
+---------------------+--------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------------------------+------+-----+---------+----------------+
| boundary | multipolygon | NO | MUL | NULL | |
| is_point | tinyint unsigned | NO | MUL | 0 | |
| boundary_simplified | multipolygon | NO | MUL | NULL | |
+---------------------+--------------------------------+------+-----+---------+----------------+
Running a SHOW INDEXES, I have this:
mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city | 1 | idx_is_point | 1 | is_point | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| zip_city | 1 | boundary | 1 | boundary | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
| zip_city | 1 | boundary_simplified | 1 | boundary_simplified | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
which looks exactly the same, but when I try to run a query using st_contains, it does not work the same for them, for example:
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)
And when I explain both queries, I see that the one using boundary_simplified is not using the index:
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | range | idx_is_point,boundary | boundary | 34 | NULL | 1 | 50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | ref | idx_is_point | idx_is_point | 1 | const | 17143 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Any clue on this? I feel like I'm missing something simple but I cannot find information about this. Also when creating the index, for the boundary column takes ~23.25 sec and for the boundary_simplified it takes only ~0.75 sec (which is weird. Do the coordinates affect the efficiency of the index?)
I've tried deleting both indexes and creating them separately, I tested the behavior w/o the index which changed of course, I've tried to use FORCE INDEX or USE INDEX inside the query which resulted in same/worse behavior.
EDIT: I fixed the indexes shown thanks to user1191247 observation. Also, I'm not showing the full table information as it is useless.
SHOW INDEXES FROM zip_city;does not includeidx_is_point, which is listed as a possible key in both explain plans. Please add the DDL forzip_cityto your question, along with some DML. The lack of time to build theboundary_simplifiedindex does seem suspicious.