I have a table with containing about 1m records.
When I run select * from table it will cause timeout and I see the query is in state IO: DataFileRead.
When I run the select * from table where id>0 and id<=2147483647 which id is primary key it returns all data in couple of seconds.
Should I always include where clause even for returning all records?
Table schema
CREATE TABLE table
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
batch_id integer,
area_id integer,
asset_group text COLLATE pg_catalog."default",
asset_id text COLLATE pg_catalog."default",
parent_id text COLLATE pg_catalog."default",
reference_key text COLLATE pg_catalog."default",
maintainer_code text COLLATE pg_catalog."default",
type_code text COLLATE pg_catalog."default",
super_type_code text COLLATE pg_catalog."default"
)
The primary key is integer if I specify whole range of integer it returns data quickly but without where it takes one hour.
Even if I use column names for example select id,type_code from table it's very slow comparing to select id,type_code from table where id>0 and id<=2147483647
Below is the execution plan without using where:
Seq Scan on table (cost=0.00..6894676.46 rows=630746 width=379) (actual time=2590902.656..4068047.762 rows=792777 loops=1)
Planning Time: 0.095 ms
Execution Time: 4068076.818 ms
And when using where:
Bitmap Heap Scan on table (cost=597265.81..1252327.52 rows=630747
width=379) (actual time=72.493..211.108 rows=792777 loops=1)
Recheck Cond: ((id > 0) AND (id < 2147483647))
Heap Blocks: exact=30533
-> Bitmap Index Scan on pk_information_model_entry (cost=0.00..597108.12 rows=630747 width=0) (actual time=64.017..64.017 rows=792777 loops=1)
Index Cond: ((id > 0) AND (id < 2147483647))
Planning Time: 8.594 ms
Execution Time: 233.207 ms
I'm aware using index can improve it but why using where clause will make such a difference?
select count(*) - count(case when id>0 and id<=2147483647 then 1 end) as diff from tableto verify that you indeed select all rows when using thewhereclause