I'm using Postgres 12.10 AWS RDS. My query to find a min date using a group by is faster than just a regular min on the date. I'd like the regular min to be as fast but not sure if I'm putting in the wrong indices or I need to tune another parameter.
I have a table
CREATE TABLE IF NOT EXISTS public.ed
(
isd character varying(90) COLLATE pg_catalog."default" NOT NULL,
e_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
d_date timestamp with time zone NOT NULL,
CONSTRAINT ed_pkey PRIMARY KEY (isd, e_id)
)
Indices:
CREATE INDEX IF NOT EXISTS ix_ed_d_date
ON public.ed USING btree
(d_date ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS ix_ed_e_id
ON public.ed USING btree
(e_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
Query without just min takes 3 minutes:
select min(d_date)
from ed
where e_id = '62e2032b029b036ba25c73cf';
Explain Analyze for query:
Result (cost=171.70..171.71 rows=1 width=8) (actual time=186940.968..186941.463 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.56..171.70 rows=1 width=8) (actual time=186940.963..186940.964 rows=1 loops=1)
-> Index Scan using ix_ed_d_date on ed (cost=0.56..2214942.25 rows=12943 width=8) (actual time=186940.961..186940.962 rows=1 loops=1)
Index Cond: (d_date IS NOT NULL)
Filter: ((e_id)::text = '62e2032b029b036ba25c73cf'::text)
Rows Removed by Filter: 30539883
Planning Time: 0.195 ms
Execution Time: 186941.491 ms
While the query with the group by takes less than a second:
select min(d_date)
from ed
where e_id in ('62e2032b029b036ba25c73cf')
group by e_id;
Explain Analyze:
GroupAggregate (cost=0.56..5365.73 rows=2319 width=33) (actual time=92.093..92.095 rows=1 loops=1)
Group Key: e_id
-> Index Scan using ix_ed_e_id on ed (cost=0.56..5277.83 rows=12943 width=33) (actual time=6.753..90.622 rows=6698 loops=1)
Index Cond: ((e_id)::text = '62e2032b029b036ba25c73cf'::text)
Planning Time: 0.098 ms
Execution Time: 92.127 ms
I get the same result, but why is the simpler query using the d_date index? How can I make the simple min(d_date) without group by as performant as the one that uses the group by?