0

I have the following PostgreSQL table:

CREATE TABLE staff (
    id integer primary key,
    full_name  VARCHAR(100) NOT NULL,
    department VARCHAR(100) NULL,
    tier bigint
);

Filled random data into this table using following block:

do $$
declare
begin
    FOR counter IN 1 .. 100000 LOOP
        INSERT INTO staff (id, full_name, department, tier)
        VALUES (nextval('staff_sequence'), 
                random_string(10),
                random_string(20), 
                get_department(),
                floor(random() * 5 + 1)::bigint);
    end LOOP;
end; $$

After the data is populated, I created an index on this table on the tier column:

create index staff_tier_idx on staff(tier);

Although I created this index, when I execute a query using this column, I want this index NOT to be used. To accomplish this, I tried to execute this query:

select count(*) from staff where tier=1::numeric;

Due to mismatching data types on the indexed column and the query condition, I thought the index will not be used & instead a sequential scan will be executed. However, when I run EXPLAIN ANALYZE on the above query I get the following output:

Aggregate  (cost=2349.54..2349.55 rows=1 width=8) (actual time=17.078..17.079 rows=1 loops=1)
->  Index Only Scan using staff_tier_idx on staff  (cost=0.29..2348.29 rows=500 width=0) (actual time=0.022..15.925 rows=19942 loops=1)
        Filter: ((tier)::numeric = '1'::numeric)
        Rows Removed by Filter: 80058
        Heap Fetches: 0
Planning Time: 0.305 ms
Execution Time: 17.130 ms

Showing that the index has indeed been used.

How do I change this so that the query uses a sequential scan instead of the index? This is purely for a testing/learning purposes.

If its of any importance, I am running this on an Amazon RDS database instance

2
  • 1
    From here Planner methof conf: SET enable_indexonlyscan TO off; Commented Mar 1, 2022 at 23:26
  • It rather ruins the example when it uses functions we don't have access to. Commented Mar 2, 2022 at 2:58

1 Answer 1

1

From the "Filter" rows of the plan like

 Rows Removed by Filter: 80058

you can see that the index is not being used as a real index, but just as a skinny table, testing the casted condition for each row. This appears favorable because the index is less than 1/4 the size of the table, while the default ratio of random_page_cost/seq_page_cost = 4.

In addition to just outright disabling index scans as Adrian already suggested, you could also discourage this "skinny table" usage by just increasing random_page_cost, since pages of indexes are assumed to be read in random order.

Another method would be to change the query so it can't use the index-only scan. For example, just using count(full_name) would do that, as PostgreSQL then needs to visit the table to make sure full_name is not NULL (even though it has a constraint asserting that already--sometimes it is not very clever)

Which method is better depends on what it is you are wanting to test/learn.

Sign up to request clarification or add additional context in comments.

2 Comments

Flat out disabling index scans was not an option as I want index scans to be enabled but Postgres to choose not to use the index. Adding count(full_name) did the trick. If it has to visit the table to make sure full_name is not null, with count(*) won't it have to visit the table to make sure full_name AND department is not null?
count(*) counts every row, even if all columns in the row are NULL. It is a bit special. As long as the table page is marked "all visible" in the VM, it won't need to visit that page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.