postgres select large table timeout

Question

I have a table with containing about 1m records. When I run select * from table it will cause timeout and I see the query is in state IO: DataFileRead. When I run the select * from table where id>0 and id<=2147483647 which id is primary key it returns all data in couple of seconds.

Should I always include where clause even for returning all records?

Table schema

CREATE TABLE table
(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    batch_id integer,
    area_id integer,
    asset_group text COLLATE pg_catalog."default",
    asset_id text COLLATE pg_catalog."default",
    parent_id text COLLATE pg_catalog."default",
    reference_key text COLLATE pg_catalog."default",
    maintainer_code text COLLATE pg_catalog."default",
    type_code text COLLATE pg_catalog."default",
    super_type_code text COLLATE pg_catalog."default"
)

The primary key is integer if I specify whole range of integer it returns data quickly but without where it takes one hour. Even if I use column names for example select id,type_code from table it's very slow comparing to select id,type_code from table where id>0 and id<=2147483647

Below is the execution plan without using where:

 Seq Scan on table  (cost=0.00..6894676.46 rows=630746 width=379) (actual time=2590902.656..4068047.762 rows=792777 loops=1)
Planning Time: 0.095 ms
Execution Time: 4068076.818 ms

And when using where:

Bitmap Heap Scan on table  (cost=597265.81..1252327.52 rows=630747 
width=379) (actual time=72.493..211.108 rows=792777 loops=1)

Recheck Cond: ((id > 0) AND (id < 2147483647))
  Heap Blocks: exact=30533
  ->  Bitmap Index Scan on pk_information_model_entry  (cost=0.00..597108.12 rows=630747 width=0) (actual time=64.017..64.017 rows=792777 loops=1)
        Index Cond: ((id > 0) AND (id < 2147483647))
Planning Time: 8.594 ms
Execution Time: 233.207 ms

I'm aware using index can improve it but why using where clause will make such a difference?

why are you using * instead of column names? It is not the best practice. what are the column datatypes in the table., how many columns are there ? — Sund'er
– Sund'er, Commented Aug 17, 2022 at 8:08
select count(*) - count(case when id>0 and id<=2147483647 then 1 end) as diff from table to verify that you indeed select all rows when using the where clause — David דודו Markovitz
– David דודו Markovitz, Commented Aug 17, 2022 at 8:31
it varies from 3 to 20 and it should have been varchar. It's client decision not mine and I'm working on why using where returns quicker. I also created a copy of table and the copy table without where returns in 13 seconds not one hour. — pers
– pers, Commented Aug 17, 2022 at 12:23

jjanes · Accepted Answer · 2022-08-17 13:12:00Z

2

Your table seems to be massively bloated (full of totally empty pages). Using the index allows to skip the reading of those pages. You could fix it with a VACUUM FULL of the table, or using something like pg_squeeze.

You might also want to investigate how it got that way in the first place, so you can prevent it from recurring.

To reduce planning time, PostgreSQL doesn't consider using an index unless it "might possibly be useful". But just overcoming extreme bloat is not considered to be "possibly useful", which is why it only uses the index after you introduce a dummy WHERE clause which references the column.

answered Aug 17, 2022 at 13:12

jjanes

45k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

postgres select large table timeout

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related