I have the following table:
CREATE TABLE transactions
(
id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('transactions_sequence') PRIMARY KEY,
transaction_date TIMESTAMP DEFAULT NULL NULL,
transaction_type VARCHAR(255) DEFAULT NULL NULL,
merchant_id VARCHAR(255) DEFAULT NULL NULL,
transaction_type VARCHAR(255) DEFAULT NULL NULL,
-- Some more columns here
);
and the following index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
I have the following queries:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_type = 'a'
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
This query works just fine and I get an index only scan:
Limit (cost=0.29..7.47 rows=1 width=13) (actual time=1.119..1.120 rows=0 loops=1)
-> Index Scan using transactions_transaction_type_idx on transactions (cost=0.29..7.47 rows=1 width=13) (actual time=1.118..1.118 rows=0 loops=1)
Index Cond: (((transaction_type)::text = 'a'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
Filter: ((merchant_id)::text = 'some_merchant_id'::text)
Planning Time: 0.311 ms
Execution Time: 1.139 ms
However, when I need a transaction_type independent results with:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
I still get the index only scan:
Limit (cost=38.08..38.19 rows=44 width=13) (actual time=0.108..0.115 rows=47 loops=1)
-> Sort (cost=38.08..38.19 rows=44 width=13) (actual time=0.107..0.110 rows=47 loops=1)
Sort Key: transaction_date DESC
Sort Method: quicksort Memory: 27kB
-> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.29..36.88 rows=44 width=13) (actual time=0.029..0.093 rows=47 loops=1)
Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning Time: 0.228 ms
Execution Time: 0.161 ms
I do have a list of all the potential transaction_type values so I initially thought that this would be better:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_type IN ('a', 'b', 'c', ...) -- all the potential values here
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
but instead, depending on the number of values in IN clause, I might get an additional filter in the query plan:
Limit (cost=38.29..38.40 rows=43 width=13) (actual time=0.110..0.118 rows=47 loops=1)
-> Sort (cost=38.29..38.40 rows=43 width=13) (actual time=0.109..0.112 rows=47 loops=1)
Sort Key: transaction_date DESC
Sort Method: quicksort Memory: 27kB
-> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.31..37.13 rows=43 width=13) (actual time=0.030..0.097 rows=47 loops=1)
Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
" Filter: ((transaction_type)::text = ANY ('{a,b,c,d,e,f}'::text[]))"
Heap Fetches: 0
Planning Time: 0.340 ms
Execution Time: 0.142 ms
So even if I skip the middle transaction_type column, I get my index used. But with which query am I better of, with IN on transaction_type with all the potential values or without even the filter? How does my index still used without the filter on transaction_type?
Update:
So, an additional concern; an index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
or an index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
given that 50% of the time, I won't have restriction on transaction_type (i.e. no AND transation_type IN ('a', 'b', 'c') clause.
The remaining 50%, I am trying to eliminate half of my transaction_types, which sometimes yields to an extra Filter: ((transaction_type)::text = ANY ('{a,b,c}'::text[]))" condition regardless the index containing transaction_type.
So even with the index containing transaction_type I might get the extra Filter when the AND transaction_type IN (...) is not very restrictive.
So which index is better? The one containing transaction_type or not?