postgres query slow perf: not in faster than prop equal

Question

So I have this query on a pretty big table:

SELECT * FROM datTable WHERE type='bla' 
AND timestamp > (CURRENT_DATE - INTERVAL '1 day')

This query is too slow, like 5 seconds; and there is an index on type

So I tried:

SELECT * FROM datTable WHERE type NOT IN ('blu','bli','blo') 
AND timestamp > (CURRENT_DATE - INTERVAL '1 day')

This query is way better like 1second, but the issue is that I don't want this not type list hardcoded.

So I tried:

with res as (
    SELECT * FROM datTable WHERE type NOT IN ('blu','bli','blo') 
    AND timestamp > (CURRENT_DATE - INTERVAL '1 day')
)
select * from res where type='bla'

And I'm back to bad perf, 5 seconds same as before.

Any idea how I could trick postgres to get the 1sec perf but specifying positively the type I want ('bla') ?

EDIT: EXPLAIN ANALYZE for the last request

GroupAggregate  (cost=677400.59..677493.09 rows=3595 width=59) (actual time=4789.667..4803.183 rows=3527 loops=1)
  Group Key: event_historic.sender
  ->  Sort  (cost=677400.59..677412.48 rows=4756 width=23) (actual time=4789.646..4792.808 rows=68045 loops=1)
        Sort Key: event_historic.sender
        Sort Method: quicksort  Memory: 9469kB
        ->  Bitmap Heap Scan on event_historic  (cost=505379.21..677110.11 rows=4756 width=23) (actual time=4709.494..4769.437 rows=68045 loops=1)
              Recheck Cond: (("timestamp" > (CURRENT_DATE - '1 day'::interval)) AND ((type)::text = 'NEAR_TRANSFER'::text))
              Heap Blocks: exact=26404
              ->  BitmapAnd  (cost=505379.21..505379.21 rows=44676 width=0) (actual time=4706.080..4706.082 rows=0 loops=1)
                    ->  Bitmap Index Scan on event_historic_timestamp_idx  (cost=0.00..3393.89 rows=263109 width=0) (actual time=167.838..167.838 rows=584877 loops=1)
                          Index Cond: ("timestamp" > (CURRENT_DATE - '1 day'::interval))
                    ->  Bitmap Index Scan on event_historic_type_idx  (cost=0.00..501982.69 rows=45316549 width=0) (actual time=4453.071..4453.071 rows=44279973 loops=1)
                          Index Cond: ((type)::text = 'NEAR_TRANSFER'::text)
Planning Time: 0.385 ms
JIT:
  Functions: 10
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.505 ms, Inlining 18.102 ms, Optimization 87.745 ms, Emission 44.270 ms, Total 152.622 ms
Execution Time: 4809.099 ms

EDIT 2: After adding the index on (type, timestamp) the result is way faster:

HashAggregate  (cost=156685.88..156786.59 rows=8057 width=59) (actual time=95.201..96.511 rows=3786 loops=1)
  Group Key: sender
  Batches: 1  Memory Usage: 2449kB
  Buffers: shared hit=31041
  ->  Index Scan using typetimestamp on event_historic eh  (cost=0.57..156087.67 rows=47857 width=44) (actual time=12.244..55.921 rows=76220 loops=1)
        Index Cond: (((type)::text = 'NEAR_TRANSFER'::text) AND ("timestamp" > (CURRENT_DATE - '1 day'::interval)))
        Buffers: shared hit=31041
Planning:
  Buffers: shared hit=5
Planning Time: 0.567 ms
JIT:
  Functions: 10
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.543 ms, Inlining 0.000 ms, Optimization 1.221 ms, Emission 10.819 ms, Total 14.584 ms
Execution Time: 99.496 ms

No answer possible without EXPLAIN (ANALYZE, BUFFERS) output for the queries. Please add that to the question. — Laurenz Albe
– Laurenz Albe, Commented Dec 16, 2022 at 10:17
so this the where which scan Bitmap Index Scan on event_historic_type_idx which is slow, but how to improve this — François Richard
– François Richard, Commented Dec 16, 2022 at 10:50
While the multicolumn index is surely a good answer, I still have to wonder what the heck is going on with your 2nd query that it is faster. Could you show the plan for that one? — jjanes
– jjanes, Commented Dec 16, 2022 at 21:00

Laurenz Albe · Accepted Answer · 2022-12-16 12:53:26Z

1

You need a two-column index on ((type::text), timestamp) to make that query fast.

Let me explain the reasoning behind the index order in detail. If type is first in the index, the index scan can start with the first index entry after ('NEAR_TRANSFER', <now - 1 day>) and scan all index entries until it hits the next type, so all the index entries that are found correspond to a result row. If the index order is the other way around, the scan has to start at the first entry after (<now - 1 day>, ...) and read all index entries up to the end of the index. It discards the index entries where type IS DISTINCT FROM 'NEAR_TRANSFER' and fetches the table rows for the remaining index entries. So this scan will fetch the same number of table rows, but has to read more index entries.

It is an old myth that the most selective column should be the first in the index, but it is nonetheless a myth. For the reason described above, you should have the columns that are compared with = first in the index. The selectivity of the columns is irrelevant.

All this is speaking about a single query in isolation. But you always have to consider all the other queries in the workload, and for them it may make a difference how the columns are ordered.

edited Dec 16, 2022 at 12:53

answered Dec 16, 2022 at 11:52

Laurenz Albe

257k22 gold badges314 silver badges390 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

François Richard Over a year ago

why type go first ? Frank below is suggesting timestamp first

Frank Heikens Over a year ago

@Laurenz Albe: Could you elaborate about the column order and especially the equation? I always start with the column that reduces the dataset best, not checking for = or > operators.

Laurenz Albe Over a year ago

@FrankHeikens I have added an explanation.

Frank Heikens · Accepted Answer · 2022-12-16 10:55:01Z

1

A single index on timestamp and type might be faster:

CREATE INDEX idx1 ON datTable (timestamp, type);

Or maybe:

CREATE INDEX idx1 ON datTable (type, timestamp);

Check the query plan if the new index is used. Maybe you have to drop an old one as well. And most likely you could drop the one anyway.

answered Dec 16, 2022 at 10:55

Frank Heikens

129k26 gold badges157 silver badges153 bronze badges

4 Comments

François Richard Over a year ago

that's an idea thank you, any idea why index order would change anything ? (diff between your 2 solutions)

Frank Heikens Over a year ago

Check the query plan: Based on the timestamp you need just 584877 records, based on type you need 44279973 records. That's why timestamp makes sense to go first

Frank Heikens Over a year ago

@FrançoisRichard: Could you share the new query plan after creating the new index or indexes? And did the performance improve?

François Richard Over a year ago

sure, I edited the question. Performance improved dramatically as you can see.

Collectives™ on Stack Overflow

postgres query slow perf: not in faster than prop equal

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related