Why does postgres use index scan over sequential scan even with a mismatching data type on the indexed column and query condition

Question

I have the following PostgreSQL table:

CREATE TABLE staff (
    id integer primary key,
    full_name  VARCHAR(100) NOT NULL,
    department VARCHAR(100) NULL,
    tier bigint
);

Filled random data into this table using following block:

do $$
declare
begin
    FOR counter IN 1 .. 100000 LOOP
        INSERT INTO staff (id, full_name, department, tier)
        VALUES (nextval('staff_sequence'), 
                random_string(10),
                random_string(20), 
                get_department(),
                floor(random() * 5 + 1)::bigint);
    end LOOP;
end; $$

After the data is populated, I created an index on this table on the tier column:

create index staff_tier_idx on staff(tier);

Although I created this index, when I execute a query using this column, I want this index NOT to be used. To accomplish this, I tried to execute this query:

select count(*) from staff where tier=1::numeric;

Due to mismatching data types on the indexed column and the query condition, I thought the index will not be used & instead a sequential scan will be executed. However, when I run EXPLAIN ANALYZE on the above query I get the following output:

Aggregate  (cost=2349.54..2349.55 rows=1 width=8) (actual time=17.078..17.079 rows=1 loops=1)
->  Index Only Scan using staff_tier_idx on staff  (cost=0.29..2348.29 rows=500 width=0) (actual time=0.022..15.925 rows=19942 loops=1)
        Filter: ((tier)::numeric = '1'::numeric)
        Rows Removed by Filter: 80058
        Heap Fetches: 0
Planning Time: 0.305 ms
Execution Time: 17.130 ms

Showing that the index has indeed been used.

How do I change this so that the query uses a sequential scan instead of the index? This is purely for a testing/learning purposes.

If its of any importance, I am running this on an Amazon RDS database instance

From here Planner methof conf: SET enable_indexonlyscan TO off; — user7070613
– user7070613, Commented Mar 1, 2022 at 23:26
It rather ruins the example when it uses functions we don't have access to. — jjanes
– jjanes, Commented Mar 2, 2022 at 2:58

jjanes · Accepted Answer · 2022-03-02 03:47:11Z

1

From the "Filter" rows of the plan like

 Rows Removed by Filter: 80058

you can see that the index is not being used as a real index, but just as a skinny table, testing the casted condition for each row. This appears favorable because the index is less than 1/4 the size of the table, while the default ratio of random_page_cost/seq_page_cost = 4.

In addition to just outright disabling index scans as Adrian already suggested, you could also discourage this "skinny table" usage by just increasing random_page_cost, since pages of indexes are assumed to be read in random order.

Another method would be to change the query so it can't use the index-only scan. For example, just using count(full_name) would do that, as PostgreSQL then needs to visit the table to make sure full_name is not NULL (even though it has a constraint asserting that already--sometimes it is not very clever)

Which method is better depends on what it is you are wanting to test/learn.

answered Mar 2, 2022 at 3:47

jjanes

45k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user929287171 Over a year ago

Flat out disabling index scans was not an option as I want index scans to be enabled but Postgres to choose not to use the index. Adding count(full_name) did the trick. If it has to visit the table to make sure full_name is not null, with count(*) won't it have to visit the table to make sure full_name AND department is not null?

jjanes Over a year ago

count(*) counts every row, even if all columns in the row are NULL. It is a bit special. As long as the table page is marked "all visible" in the VM, it won't need to visit that page.

Collectives™ on Stack Overflow

Why does postgres use index scan over sequential scan even with a mismatching data type on the indexed column and query condition

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related