I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.
SELECT a.id,
closest_pt.dist,
closest_pt.name,
closest_pt.meters
FROM "hex-hex-uk" a
CROSS JOIN lateral
(
SELECT a.id,
b.name AS name,
a.geom <-> b.way AS dist,
st_distance(a.geom, b.way, FALSE) AS meters
FROM "osm-polygons-uk" b
WHERE (
b.landuse='industrial'
OR b.man_made='works')
AND st_area(b.way, FALSE)>15000
ORDER BY a.geom <-> b.way
LIMIT 1) AS closest_pt
WHERE a.id='abc'
Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swopping
a.id='abc' with a.id IN ('abc','def','ghi',...) and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.
Here's the query plan as it stands:
" -> Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
" Index Cond: ((id)::text = '89195c849a3ffff'::text)"
" -> Limit (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
" -> Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
" Order By: (way <-> a.geom)"
" Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
" Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"
What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.
Thanks!
CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);
CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);
st_subdividefirst (in an indexed materialized view or else)EXPLAIN (ANALYZE, BUFFERS). Turn track_io_timing on first if you can.