0

I have the following datamodel:

A parent table with ~310M lines:

Table parent:
  Column    | Type
------------+-------------------------------
id          | BIGINT (Primary key, sequence)
type        | VARCHAR
group       | VARCHAR
date        | TIMESTAMP
isok        | BOOLEAN

With an index on (group,isok) where isok = false

and a child with ~1000M lines:

Table child
  Column    | Type
------------+-------------------------------
parentid    | BIGINT (Foreign Key)
field1      | VARCHAR
field2      | VARCHAR

With an index on (parentid)

1 parent could have 0 to N children.

I need to execute this query:

SELECT p.id, p.type, p.date, c.field1, c.field2 
FROM parent p
LEFT OUTER JOIN child AS c ON p.id = c.parentid
WHERE group = 'groupname' AND isok = false;

EXPLAIN ANALYZE tells me that the query plan is:

                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join  (cost=223072.57..34724441.40 rows=698363 width=65) (actual time=7944.249..933430.677 rows=286257 loops=1)
  Hash Cond: (c.parentid = p.id)
  ->  Seq Scan on child c  (cost=0.00..23840617.04 rows=1217573504 width=47) (actual time=0.005..488678.149 rows=1217573499 loops=1)
  ->  Hash  (cost=220871.38..220871.38 rows=176095 width=26) (actual time=206.169..206.169 rows=283686 loops=1)
        Buckets: 32768  Batches: 1  Memory Usage: 17731kB
        ->  Index Scan using parent_group_nok_idx on parent p  (cost=0.55..220871.38 rows=176095 width=26) (actual time=0.032..115.183 rows=283686 loops=1)
              Index Cond: (((group)::text = 'groupname'::text) AND (isok = false))
Total runtime: 933486.035 ms

When I disable the seqscans:

                                                                   QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=1.13..35309490.28 rows=698363 width=65) (actual time=0.684..42144.558 rows=286257 loops=1)
   ->  Index Scan using parent_group_nok_idx on parent p  (cost=0.55..220871.38 rows=176095 width=26) (actual time=0.030..122.959 rows=283686 loops=1)
         Index Cond: (((group)::text = 'groupname'::text) AND (isok = false))
   ->  Index Scan using child_parentid_idx on child c  (cost=0.58..184.74 rows=1452 width=47) (actual time=0.145..0.147 rows=1 loops=283686)
         Index Cond: (parentid = p.id)
 Total runtime: 42200.478 ms

What could I do (except disabling seq scans) to "force" the optimizer to choose the index way?

2
  • Please edit your question and add the complete execution plans. Commented Feb 24, 2016 at 15:35
  • @a_horse_with_no_name It's done :) Commented Feb 24, 2016 at 15:53

1 Answer 1

3

After looking for similar issues and by reading this article: trumping-the-postgresql-query-planner, I try with CTE queries: Here is the query I use:

WITH cte AS (
    SELECT id, type, date 
    FROM parent 
    WHERE group = 'groupname' AND isok = false 
    ORDER BY id ASC
)
SELECT cte.id, cte.type, cte.date, c.field1, c.field2 
FROM cte LEFT OUTER JOIN child c ON c.parentid = cte.id;

Now, there is the query plan:

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=236655.35..33098268.28 rows=238777013 width=56) (actual time=243.160..1473.618 rows=286257 loops=1)
   CTE cte
     ->  Sort  (cost=236214.54..236654.77 rows=176095 width=26) (actual time=243.135..314.067 rows=283686 loops=1)
           Sort Key: e.id
           Sort Method: quicksort  Memory: 34451kB
           ->  Index Scan using parent_group_nok_idx on parent  (cost=0.55..220871.38 rows=176095 width=26) (actual time=0.041..113.058 rows=283686 loops=1)
                 Index Cond: (((group)::text = 'groupname'::text) AND (isok = false))
   ->  CTE Scan on cte  (cost=0.00..3521.90 rows=176095 width=18) (actual time=243.140..449.385 rows=283686 loops=1)
   ->  Index Scan using child_parentid_idx on child c  (cost=0.58..173.03 rows=1356 width=46) (actual time=0.002..0.003 rows=1 loops=283686)
         Index Cond: (parentid = cte.id)
 Total runtime: 1526.945 ms

And my index is now used.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.