I have the following datamodel:
A parent table with ~310M lines:
Table parent:
Column | Type
------------+-------------------------------
id | BIGINT (Primary key, sequence)
type | VARCHAR
group | VARCHAR
date | TIMESTAMP
isok | BOOLEAN
With an index on (group,isok) where isok = false
and a child with ~1000M lines:
Table child
Column | Type
------------+-------------------------------
parentid | BIGINT (Foreign Key)
field1 | VARCHAR
field2 | VARCHAR
With an index on (parentid)
1 parent could have 0 to N children.
I need to execute this query:
SELECT p.id, p.type, p.date, c.field1, c.field2
FROM parent p
LEFT OUTER JOIN child AS c ON p.id = c.parentid
WHERE group = 'groupname' AND isok = false;
EXPLAIN ANALYZE tells me that the query plan is:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=223072.57..34724441.40 rows=698363 width=65) (actual time=7944.249..933430.677 rows=286257 loops=1)
Hash Cond: (c.parentid = p.id)
-> Seq Scan on child c (cost=0.00..23840617.04 rows=1217573504 width=47) (actual time=0.005..488678.149 rows=1217573499 loops=1)
-> Hash (cost=220871.38..220871.38 rows=176095 width=26) (actual time=206.169..206.169 rows=283686 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 17731kB
-> Index Scan using parent_group_nok_idx on parent p (cost=0.55..220871.38 rows=176095 width=26) (actual time=0.032..115.183 rows=283686 loops=1)
Index Cond: (((group)::text = 'groupname'::text) AND (isok = false))
Total runtime: 933486.035 ms
When I disable the seqscans:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.13..35309490.28 rows=698363 width=65) (actual time=0.684..42144.558 rows=286257 loops=1)
-> Index Scan using parent_group_nok_idx on parent p (cost=0.55..220871.38 rows=176095 width=26) (actual time=0.030..122.959 rows=283686 loops=1)
Index Cond: (((group)::text = 'groupname'::text) AND (isok = false))
-> Index Scan using child_parentid_idx on child c (cost=0.58..184.74 rows=1452 width=47) (actual time=0.145..0.147 rows=1 loops=283686)
Index Cond: (parentid = p.id)
Total runtime: 42200.478 ms
What could I do (except disabling seq scans) to "force" the optimizer to choose the index way?