postgres min() slower than min() with group by

Question

I'm using Postgres 12.10 AWS RDS. My query to find a min date using a group by is faster than just a regular min on the date. I'd like the regular min to be as fast but not sure if I'm putting in the wrong indices or I need to tune another parameter.

I have a table

CREATE TABLE IF NOT EXISTS public.ed
(
    isd character varying(90) COLLATE pg_catalog."default" NOT NULL,
    e_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
    d_date timestamp with time zone NOT NULL,
    CONSTRAINT ed_pkey PRIMARY KEY (isd, e_id)
)

Indices:

CREATE INDEX IF NOT EXISTS ix_ed_d_date
    ON public.ed USING btree
    (d_date ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS ix_ed_e_id
    ON public.ed USING btree
    (e_id COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

Query without just min takes 3 minutes:

select min(d_date)
from ed
where e_id = '62e2032b029b036ba25c73cf';

Explain Analyze for query:

Result  (cost=171.70..171.71 rows=1 width=8) (actual time=186940.968..186941.463 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.56..171.70 rows=1 width=8) (actual time=186940.963..186940.964 rows=1 loops=1)
          ->  Index Scan using ix_ed_d_date on ed  (cost=0.56..2214942.25 rows=12943 width=8) (actual time=186940.961..186940.962 rows=1 loops=1)
                Index Cond: (d_date IS NOT NULL)
                Filter: ((e_id)::text = '62e2032b029b036ba25c73cf'::text)
                Rows Removed by Filter: 30539883
Planning Time: 0.195 ms
Execution Time: 186941.491 ms

While the query with the group by takes less than a second:

select min(d_date)
from ed
where e_id in ('62e2032b029b036ba25c73cf')
group by e_id;

Explain Analyze:

GroupAggregate  (cost=0.56..5365.73 rows=2319 width=33) (actual time=92.093..92.095 rows=1 loops=1)
  Group Key: e_id
  ->  Index Scan using ix_ed_e_id on ed  (cost=0.56..5277.83 rows=12943 width=33) (actual time=6.753..90.622 rows=6698 loops=1)
        Index Cond: ((e_id)::text = '62e2032b029b036ba25c73cf'::text)
Planning Time: 0.098 ms
Execution Time: 92.127 ms

I get the same result, but why is the simpler query using the d_date index? How can I make the simple min(d_date) without group by as performant as the one that uses the group by?

Laurenz Albe · Accepted Answer · 2022-10-13 10:19:58Z

3

'62e2032b029b036ba25c73cf' seems to be a frequent e_id, so PostgreSQL thinks that it can win by looking at the rows in d_date order until it finds one with the correct e_id. Unfortunately all rows with e_id = '62e2032b029b036ba25c73cf' have high d_date, and PostgreSQL has to skip 30 million rows until it finds a hit.

You can prevent PostgreSQL from using the index if you use min(d_date + INTERVAL '0'). Then the first query should be as fast as the second.

Creating a composite index on (e_id, d_date) would be the perfect solution to the problem, then PostgreSQL cannot choose the wrong index by mistake.

edited Oct 13, 2022 at 10:19

answered Oct 13, 2022 at 8:42

Laurenz Albe

62.8k4 gold badges58 silver badges94 bronze badges

ERROR: operator does not exist: timestamp with time zone + integer LINE 1: select min(d_date + 0) ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. SQL state: 42883 Character: 26

Foo L
– Foo L

2022-10-13 08:45:36 +00:00
Commented Oct 13, 2022 at 8:45
1

Who would come up with a timestamp column and name it date? Anyway, I'll update the answer.

Laurenz Albe
– Laurenz Albe

2022-10-13 08:46:41 +00:00
Commented Oct 13, 2022 at 8:46
agreed. that was bad naming. your suggestion does work. is there anything else besides adding this hack to the query to get the same performance? ORM frameworks would not know about about this extra requirement.

Foo L
– Foo L

2022-10-13 08:52:25 +00:00
Commented Oct 13, 2022 at 8:52
2

A composite index on (e_id, date) might also help choosing a better plan (and then you can remove the single (e_id) index.

ypercubeᵀᴹ
– ypercubeᵀᴹ

2022-10-13 09:12:49 +00:00
Commented Oct 13, 2022 at 9:12
1

@ypercubeᵀᴹ Of course. i took the liberty to add it to the answer.

Laurenz Albe
– Laurenz Albe

2022-10-13 10:20:25 +00:00
Commented Oct 13, 2022 at 10:20

| Show 3 more comments

Stack Exchange Network

postgres min() slower than min() with group by

1 Answer 1

Your Answer

Hot Network Questions

postgres min() slower than min() with group by

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions