1

I have a table called cities:

CREATE TABLE `cities` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `city` varchar(255) DEFAULT NULL,
  `state_code` varchar(10) DEFAULT NULL,
  `country_code` varchar(10) DEFAULT NULL,
  `latitude` double DEFAULT NULL,
  `longitude` double DEFAULT NULL,
  `geom` geometry NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  KEY `country_code` (`country_code`),
  SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

It contains around 4000 rows, none of which have the state_code field populated. I'm trying to fill that field using data I have in another table called geodata:

CREATE TABLE `geodata` (
  `region_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `country_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `state_code` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `geom` geometry NOT NULL,
  PRIMARY KEY (`region_id`),
  KEY `country_code` (`country_code`),
  KEY `state_code` (`state_code`),
  SPATIAL KEY `geom` (`geom`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

That table has ~1500 rows with the polygon data for each of the US states and Canadian provinces (some states have more than one entry because the source data broke it up into multiple geometries).

I'm trying to find the most efficient query for looking up the cities.geom point within all the geodata.geom polygons, and then updating cities.state_code with the appropriate data from the geodata table.

Just as a test (SELECTing, not UPDATEing), I tried this:

SELECT
    C.cityid_ppn,
    C.city,
    C.latitude,
    C.longitude,
    G.state_code,
    G.country_code
FROM cities C
LEFT JOIN geodata G ON (
    C.country_code = G.country_code
    AND ST_WITHIN(C.geom, G.geom)
)
WHERE
    G.country_code='CA'
    AND g.state_code='ON'
;

Even with this subset of data, the query takes 9 seconds. Trying to run it on the full set (i.e. without the WHERE ... clause above), I gave up after waiting 5 minutes.

What am I doing wrong? And what would be the best way to do the update of cities?

3
  • 1
    You can try to change city geom data type from GEOMETRY to POINT. Or use lon/lan if city.geom holds not the point but the city's boundries. Also, I think you should change city state & country codes data types to match them to regions (charset/collate). Unfortunately, I do not know if this really help to speed up your query. Commented Mar 19, 2019 at 22:09
  • Thanks! I tried changing cities.geom to POINT and even geodata.geom to POLYGON, and that didn't seem to affect the query speed at all. Also, changing the charset/collate on the code columns to match didn't help either. Commented Mar 20, 2019 at 11:48
  • 1
    Probably you have already seen this (looks like your case): stackoverflow.com/questions/53182385/… Commented Mar 21, 2019 at 4:08

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.