0

I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.

SELECT     a.id,
           closest_pt.dist,
           closest_pt.name,
           closest_pt.meters
FROM       "hex-hex-uk" a
CROSS JOIN lateral
       (
                SELECT   a.id,
                         b.name                            AS name,
                         a.geom <-> b.way                  AS dist,
                         st_distance(a.geom, b.way, FALSE) AS meters
                FROM     "osm-polygons-uk" b
                WHERE    (
                                  b.landuse='industrial'
                         OR       b.man_made='works')
                AND      st_area(b.way, FALSE)>15000
                ORDER BY a.geom <-> b.way
                LIMIT    1) AS closest_pt
WHERE      a.id='abc'

Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swopping a.id='abc' with a.id IN ('abc','def','ghi',...) and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.

Here's the query plan as it stands:

"  ->  Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a  (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
"        Index Cond: ((id)::text = '89195c849a3ffff'::text)"
"  ->  Limit  (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
"        ->  Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b  (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
"              Order By: (way <-> a.geom)"
"              Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
"              Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"

What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.

Thanks!

CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);

CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);
6
  • we need still to know the CREATE TABLE and the indexes, they are vital to optimize Queries Commented Apr 25, 2022 at 12:09
  • Apologies, added Commented Apr 25, 2022 at 12:13
  • 1
    landuser and manmade could use a combined index Commented Apr 25, 2022 at 12:41
  • You can make a partial index only using the 3 conditions. Otherwise if the found polygons have many vertices you can look at applying st_subdivide first (in an indexed materialized view or else) Commented Apr 25, 2022 at 13:39
  • Collect the plan using EXPLAIN (ANALYZE, BUFFERS). Turn track_io_timing on first if you can. Commented Apr 25, 2022 at 13:59

1 Answer 1

0

Some great tips above. The comment about the indexed materialized view led me to create a view with only the filtered data.. it cut the number of rows down from 1 million to ~20000 and executed in a couple of seconds.

From then I tweaked the original query and it ended up blasting through 2400000 rows in a couple of minutes. A huge improvement from the original 13 hours it was going to take to run!

SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
            FROM "hex-hex-uk" a
            CROSS JOIN LATERAL
              (SELECT
                 id,
                 b.name as name,
                 a.geom <-> b.way as dist,
                 b.way as way
                 FROM "tmp_industrial" b
                 ORDER BY dist ASC
               LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);

Thanks for the tips, it gives me a bit of a guide as to how to go about debugging query performance.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.