0

I'm working in simplifying zip codes polygons in a MYSQL database (v 8.0), I'm reducing the number of coordinates for each polygon.

So, I have a table named zip_city, which contains a column named boundary, which is the original multipolygon column, and I created another one with the simplified polygons, boundary_simplified. Both of them have SRID 4326 (I've included the is_point column because it might be important):

+---------------------+--------------------------------+------+-----+---------+----------------+
| Field               | Type                           | Null | Key | Default | Extra          |
+---------------------+--------------------------------+------+-----+---------+----------------+

| boundary            | multipolygon                   | NO   | MUL | NULL    |                |
| is_point            | tinyint unsigned               | NO   | MUL | 0       |                |
| boundary_simplified | multipolygon                   | NO   | MUL | NULL    |                |
+---------------------+--------------------------------+------+-----+---------+----------------+

Running a SHOW INDEXES, I have this:

mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name                  | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city |          1 | idx_is_point              |            1 | is_point            | A         |           2 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| zip_city |          1 | boundary                  |            1 | boundary            | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
| zip_city |          1 | boundary_simplified       |            1 | boundary_simplified | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

which looks exactly the same, but when I try to run a query using st_contains, it does not work the same for them, for example:

mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND          is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
         is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)

And when I explain both queries, I see that the one using boundary_simplified is not using the index:

mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
      is_point = 0      LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys         | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | range | idx_is_point,boundary | boundary | 34      | NULL |    1 |    50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}')) 
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key          | key_len | ref   | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | ref  | idx_is_point  | idx_is_point | 1       | const | 17143 |   100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Any clue on this? I feel like I'm missing something simple but I cannot find information about this. Also when creating the index, for the boundary column takes ~23.25 sec and for the boundary_simplified it takes only ~0.75 sec (which is weird. Do the coordinates affect the efficiency of the index?)

I've tried deleting both indexes and creating them separately, I tested the behavior w/o the index which changed of course, I've tried to use FORCE INDEX or USE INDEX inside the query which resulted in same/worse behavior.

EDIT: I fixed the indexes shown thanks to user1191247 observation. Also, I'm not showing the full table information as it is useless.

2
  • The information in your question is inconsistent. SHOW INDEXES FROM zip_city; does not include idx_is_point, which is listed as a possible key in both explain plans. Please add the DDL for zip_city to your question, along with some DML. The lack of time to build the boundary_simplified index does seem suspicious. Commented Oct 4, 2023 at 15:54
  • 1
    Thanks user1191247! Yes, I didn't want to add all the information to avoid a vast question, but indeed, I wanted to add that index you mentioned (idx_is_point) but added the wrong one (idx_location). Now, looking for the information you asked I've found what was the issue. I'm going to answer myself this question to show the solution. Thanks a lot! Commented Oct 4, 2023 at 18:01

1 Answer 1

0

Thanks to user1191247 comment, I looked for the information he asked and I've found this:

| zip_city | CREATE TABLE `zip_city` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `state_id` int unsigned NOT NULL,
  `zip` mediumint(5) unsigned zerofill NOT NULL,
  `city` varchar(64) NOT NULL,
  `slug` varchar(64) NOT NULL,
  `location` point NOT NULL /*!80003 SRID 4326 */,
  `boundary` multipolygon NOT NULL /*!80003 SRID 4326 */,
  `is_point` tinyint unsigned NOT NULL DEFAULT '0',
  `fit_market` tinyint unsigned NOT NULL DEFAULT '0',
  `boundary_simplified` multipolygon NOT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_zip_to_city_state1_idx` (`state_id`),
  KEY `idx_zip` (`zip`),
  KEY `idx_slug` (`slug`),
  KEY `idx_city` (`city`),
  SPATIAL KEY `idx_location` (`location`),
  SPATIAL KEY `boundary` (`boundary`),
  SPATIAL KEY `boundary_simplified` (`boundary_simplified`),
  CONSTRAINT `fk_zip_to_city_state1` FOREIGN KEY (`state_id`) REFERENCES `state` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=41381 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |

where, as you can see, boundary_simplified is missing the SRID definition, which is crucial for the index to work properly (with SELECT DISTINCT ST_SRID(boundary_simplified) FROM zip_city; I had obtained the SRID 4326, so I didn't think this was the issue, but it was missing on the column definition). I've solved this by running these queries:

DROP INDEX boundary_simplified ON zip_city;

ALTER TABLE zip_city MODIFY COLUMN boundary_simplified MULTIPOLYGON NOT NULL SRID 4326; 

(took ~53 sec)

ALTER TABLE zip_city ADD SPATIAL INDEX idx_boundary_simplified (boundary_simplified); 

(now it took ~24 sec which was already good news)

Then the INDEX worked perfectly :)

Sign up to request clarification or add additional context in comments.

1 Comment

I am glad you figured it out. Happy coding!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.