0

So I have this query on a pretty big table:

SELECT * FROM datTable WHERE type='bla' 
AND timestamp > (CURRENT_DATE - INTERVAL '1 day')

This query is too slow, like 5 seconds; and there is an index on type

So I tried:

SELECT * FROM datTable WHERE type NOT IN ('blu','bli','blo') 
AND timestamp > (CURRENT_DATE - INTERVAL '1 day')

This query is way better like 1second, but the issue is that I don't want this not type list hardcoded.

So I tried:

with res as (
    SELECT * FROM datTable WHERE type NOT IN ('blu','bli','blo') 
    AND timestamp > (CURRENT_DATE - INTERVAL '1 day')
)
select * from res where type='bla'

And I'm back to bad perf, 5 seconds same as before.

Any idea how I could trick postgres to get the 1sec perf but specifying positively the type I want ('bla') ?

EDIT: EXPLAIN ANALYZE for the last request

GroupAggregate  (cost=677400.59..677493.09 rows=3595 width=59) (actual time=4789.667..4803.183 rows=3527 loops=1)
  Group Key: event_historic.sender
  ->  Sort  (cost=677400.59..677412.48 rows=4756 width=23) (actual time=4789.646..4792.808 rows=68045 loops=1)
        Sort Key: event_historic.sender
        Sort Method: quicksort  Memory: 9469kB
        ->  Bitmap Heap Scan on event_historic  (cost=505379.21..677110.11 rows=4756 width=23) (actual time=4709.494..4769.437 rows=68045 loops=1)
              Recheck Cond: (("timestamp" > (CURRENT_DATE - '1 day'::interval)) AND ((type)::text = 'NEAR_TRANSFER'::text))
              Heap Blocks: exact=26404
              ->  BitmapAnd  (cost=505379.21..505379.21 rows=44676 width=0) (actual time=4706.080..4706.082 rows=0 loops=1)
                    ->  Bitmap Index Scan on event_historic_timestamp_idx  (cost=0.00..3393.89 rows=263109 width=0) (actual time=167.838..167.838 rows=584877 loops=1)
                          Index Cond: ("timestamp" > (CURRENT_DATE - '1 day'::interval))
                    ->  Bitmap Index Scan on event_historic_type_idx  (cost=0.00..501982.69 rows=45316549 width=0) (actual time=4453.071..4453.071 rows=44279973 loops=1)
                          Index Cond: ((type)::text = 'NEAR_TRANSFER'::text)
Planning Time: 0.385 ms
JIT:
  Functions: 10
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.505 ms, Inlining 18.102 ms, Optimization 87.745 ms, Emission 44.270 ms, Total 152.622 ms
Execution Time: 4809.099 ms

EDIT 2: After adding the index on (type, timestamp) the result is way faster:

HashAggregate  (cost=156685.88..156786.59 rows=8057 width=59) (actual time=95.201..96.511 rows=3786 loops=1)
  Group Key: sender
  Batches: 1  Memory Usage: 2449kB
  Buffers: shared hit=31041
  ->  Index Scan using typetimestamp on event_historic eh  (cost=0.57..156087.67 rows=47857 width=44) (actual time=12.244..55.921 rows=76220 loops=1)
        Index Cond: (((type)::text = 'NEAR_TRANSFER'::text) AND ("timestamp" > (CURRENT_DATE - '1 day'::interval)))
        Buffers: shared hit=31041
Planning:
  Buffers: shared hit=5
Planning Time: 0.567 ms
JIT:
  Functions: 10
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.543 ms, Inlining 0.000 ms, Optimization 1.221 ms, Emission 10.819 ms, Total 14.584 ms
Execution Time: 99.496 ms
3
  • No answer possible without EXPLAIN (ANALYZE, BUFFERS) output for the queries. Please add that to the question. Commented Dec 16, 2022 at 10:17
  • so this the where which scan Bitmap Index Scan on event_historic_type_idx which is slow, but how to improve this Commented Dec 16, 2022 at 10:50
  • While the multicolumn index is surely a good answer, I still have to wonder what the heck is going on with your 2nd query that it is faster. Could you show the plan for that one? Commented Dec 16, 2022 at 21:00

2 Answers 2

1

You need a two-column index on ((type::text), timestamp) to make that query fast.

Let me explain the reasoning behind the index order in detail. If type is first in the index, the index scan can start with the first index entry after ('NEAR_TRANSFER', <now - 1 day>) and scan all index entries until it hits the next type, so all the index entries that are found correspond to a result row. If the index order is the other way around, the scan has to start at the first entry after (<now - 1 day>, ...) and read all index entries up to the end of the index. It discards the index entries where type IS DISTINCT FROM 'NEAR_TRANSFER' and fetches the table rows for the remaining index entries. So this scan will fetch the same number of table rows, but has to read more index entries.

It is an old myth that the most selective column should be the first in the index, but it is nonetheless a myth. For the reason described above, you should have the columns that are compared with = first in the index. The selectivity of the columns is irrelevant.

All this is speaking about a single query in isolation. But you always have to consider all the other queries in the workload, and for them it may make a difference how the columns are ordered.

Sign up to request clarification or add additional context in comments.

3 Comments

why type go first ? Frank below is suggesting timestamp first
@Laurenz Albe: Could you elaborate about the column order and especially the equation? I always start with the column that reduces the dataset best, not checking for = or > operators.
@FrankHeikens I have added an explanation.
1

A single index on timestamp and type might be faster:

CREATE INDEX idx1 ON datTable (timestamp, type);

Or maybe:

CREATE INDEX idx1 ON datTable (type, timestamp);

Check the query plan if the new index is used. Maybe you have to drop an old one as well. And most likely you could drop the one anyway.

4 Comments

that's an idea thank you, any idea why index order would change anything ? (diff between your 2 solutions)
Check the query plan: Based on the timestamp you need just 584877 records, based on type you need 44279973 records. That's why timestamp makes sense to go first
@FrançoisRichard: Could you share the new query plan after creating the new index or indexes? And did the performance improve?
sure, I edited the question. Performance improved dramatically as you can see.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.