PostgreSQL 11.5 doing sequential scan for SELECT EXISTS query

Question

I have a multi tenant environment where each tenant (customer) has its own schema to isolate their data. Not ideal I know, but it was a quick port of a legacy system.

Each tenant has a "reading" table, with a composite index of 4 columns: site_code char(8), location_no int, sensor_no int, reading_dtm timestamptz.

When a new reading is added, a function is called which first checks if there has already been a reading in the last minute (for the same site_code.location_no.sensor_no):

   IF EXISTS (
        SELECT
            FROM reading r
            WHERE r.site_code   = p_site_code
            AND   r.location_no = p_location_no
            AND   r.sensor_no   = p_sensor_no
            AND   r.reading_dtm > p_reading_dtm - INTERVAL '1 minute'
    )
    THEN
        RETURN;
    END IF;

Now, bare in mind there are many tenants, all behaving fine except 1. In 1 of the tenants, the call is taking nearly half a second rather than the usual few milliseconds because it is doing a sequential scan on a table with nearly 2 million rows instead of an index scan.

My random_page_cost is set to 1.5.

I could understand a sequential scan if the query was returning possibly many rows, checking for the existance of any.

I've tried ANALYZE on the table, VACUUM FULL, etc but it makes no difference.

If I put "SET LOCAL enable_seqscan = off" before the query, it works perfectly... but it feels wrong, but it will have to be a temporary solution as this is a live system and it needs to work.

What else can I do to help Postgres make what is clearly the better decision of using the index?

EDIT: If I do a similar query manually (outside of a function) it chooses an index.

For some reason PostgreSQL is thinking the predicate r.reading_dtm > p_reading_dtm - INTERVAL '1 minute' is not selective enough. — The Impaler
– The Impaler, Commented Aug 24, 2020 at 12:18
I wonder if this has something to do with cached execution plans inside the function — user330315
– user330315, Commented Aug 24, 2020 at 12:29
It could be - if I imitate the query manually it chooses an index scan. — Mark
– Mark, Commented Aug 24, 2020 at 12:34

The Impaler · Accepted Answer · 2020-08-24 13:39:49Z

1

My guess is that the engine is evaluating the predicate and considers is not selective enough (thinks too many rows will be returned), so decides to use a table scan instead.

I would do two things:

Make sure you have the correct index in place:

 create index ix1 on reading (site_code, location_no, 
                              sensor_no, reading_dtm);

Trick the optimizer by making the selectivity look better. You can do that by adding the extra [redundant] predicate and r.reading_dtm < :p_reading_dtm:

 select 1
 from reading r
 where r.site_code   = :p_site_code
   and r.location_no = :p_location_no
   and r.sensor_no   = :p_sensor_no
   and r.reading_dtm > :p_reading_dtm - interval '1 minute'
   and r.reading_dtm < :p_reading_dtm

edited Aug 24, 2020 at 13:39

answered Aug 24, 2020 at 12:25

The Impaler

49.3k10 gold badges50 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mark Over a year ago

Yep, option 2 seems to have fixed it. I remember doing something like that before now and not being sure why it worked. That's definitely one to document, as I imagine anyone looking will wonder why..

Collectives™ on Stack Overflow

PostgreSQL 11.5 doing sequential scan for SELECT EXISTS query

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related