PostgreSQL recursive CTE performance issue

Question

I'm trying to understand such a huge difference in performance of two queries.

Let's assume I have two tables. First one contains A records for some set of domains:

                                Table "public.dns_a"
 Column |          Type          | Modifiers | Storage  | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
 name   | character varying(125) |           | extended |              |             
 a      | inet                   |           | main     |              |             
Indexes:
    "dns_a_a_idx" btree (a)
    "dns_a_name_idx" btree (name varchar_pattern_ops)

Second table handles CNAME records:

                              Table "public.dns_cname"
 Column |          Type          | Modifiers | Storage  | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
 name   | character varying(256) |           | extended |              |             
 cname  | character varying(256) |           | extended |              |             
Indexes:
    "dns_cname_cname_idx" btree (cname varchar_pattern_ops)
    "dns_cname_name_idx" btree (name varchar_pattern_ops)

Now I'm trying to solve "simple" problem with getting all the domains pointing to the same IP address, including CNAME.

The first attempt to use CTE works kind of fine:

EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
    (
        SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
    )
    UNION ALL
        SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse; 

                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on names_traverse  (cost=3051757.20..4337044.86 rows=64264383 width=1064) (actual time=0.037..1697.444 rows=199 loops=1)
   CTE names_traverse
     ->  Recursive Union  (cost=0.57..3051757.20 rows=64264383 width=45) (actual time=0.036..1697.395 rows=199 loops=1)
           ->  Index Scan using dns_a_a_idx on dns_a  (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.035..0.064 rows=14 loops=1)
                 Index Cond: (a = '118.145.5.20'::inet)
           ->  Merge Join  (cost=4377.00..176448.06 rows=6426243 width=45) (actual time=498.101..848.648 rows=92 loops=2)
                 Merge Cond: ((c.cname)::text = (nt.name)::text)
                 ->  Index Scan using dns_cname_cname_idx on dns_cname c  (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2)
                 ->  Materialize  (cost=4376.44..4474.09 rows=19530 width=516) (actual time=0.039..0.084 rows=187 loops=2)
                       ->  Sort  (cost=4376.44..4425.27 rows=19530 width=516) (actual time=0.037..0.053 rows=100 loops=2)
                             Sort Key: nt.name USING ~<~
                             Sort Method: quicksort  Memory: 33kB
                             ->  WorkTable Scan on names_traverse nt  (cost=0.00..390.60 rows=19530 width=516) (actual time=0.001..0.007 rows=100 loops=2)
 Planning time: 0.130 ms
 Execution time: 1697.477 ms
(15 rows)

There are two loops in the example above, so if I make a simple outer join query, I get much better results:

EXPLAIN ANALYZE 
SELECT * 
FROM dns_a a 
LEFT JOIN dns_cname c1 ON (c1.cname=a.name) 
LEFT JOIN dns_cname c2 ON (c2.cname=c1.name) 
WHERE a.a='118.145.5.20';

                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=1.68..65674.19 rows=1953 width=114) (actual time=1.086..12.992 rows=189 loops=1)
   ->  Nested Loop Left Join  (cost=1.12..46889.57 rows=1953 width=69) (actual time=1.085..2.154 rows=189 loops=1)
         ->  Index Scan using dns_a_a_idx on dns_a a  (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.022..0.055 rows=14 loops=1)
               Index Cond: (a = '118.145.5.20'::inet)
         ->  Index Scan using dns_cname_cname_idx on dns_cname c1  (cost=0.56..19.70 rows=329 width=45) (actual time=0.137..0.148 rows=13 loops=14)
               Index Cond: ((cname)::text = (a.name)::text)
   ->  Index Scan using dns_cname_cname_idx on dns_cname c2  (cost=0.56..6.33 rows=329 width=45) (actual time=0.057..0.057 rows=0 loops=189)
         Index Cond: ((cname)::text = (c1.name)::text)
 Planning time: 0.452 ms
 Execution time: 13.012 ms
(10 rows)

Time: 13.787 ms

So, the performance difference is about 100 times and that's the thing that worries me. I like the convenience of recursive CTE and prefer to use it instead of doing dirty tricks on application side, but I don't get why the cost of Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2) is so high.

Am I missing something important regarding CTE or the issue is with something else?

Thanks!

Update: A friend of mine spotted the number of affected rows I missed Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2), it equals total number of rows in the table and, if I understand correctly, it performs full index scan without condition and I don't get where condition is missed.

Result: After applying SET LOCAL enable_mergejoin TO false; execution time is much, much better.

EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
    (
        SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
    )
    UNION ALL
        SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on names_traverse  (cost=4746432.42..6527720.02 rows=89064380 width=1064) (actual time=0.718..45.656 rows=199 loops=1)
   CTE names_traverse
     ->  Recursive Union  (cost=0.57..4746432.42 rows=89064380 width=45) (actual time=0.717..45.597 rows=199 loops=1)
           ->  Index Scan using dns_a_a_idx on dns_a  (cost=0.57..74.82 rows=2700 width=24) (actual time=0.716..0.717 rows=14 loops=1)
                 Index Cond: (a = '118.145.5.20'::inet)
           ->  Nested Loop  (cost=0.56..296507.00 rows=8906168 width=45) (actual time=11.276..22.418 rows=92 loops=2)
                 ->  WorkTable Scan on names_traverse nt  (cost=0.00..540.00 rows=27000 width=516) (actual time=0.000..0.013 rows=100 loops=2)
                 ->  Index Scan using dns_cname_cname_idx on dns_cname c  (cost=0.56..7.66 rows=330 width=45) (actual time=0.125..0.225 rows=1 loops=199)
                       Index Cond: ((cname)::text = (nt.name)::text)
 Planning time: 0.253 ms
 Execution time: 45.697 ms
(11 rows)

damon · Accepted Answer · 2021-03-05 22:38:15Z

6

The first query is slow because of the index scan, as you noted.

The plan has to scan the complete index in order to get dns_cname sorted by cname, which is needed for the merge join. A merge join requires that both input tables are sorted by the join key, which can either be done with an index scan over the complete table (as in this case) or by a sequential scan followed by an explicit sort.

You will notice that the planner grossly overestimates all row counts for the CTE evaluation, which is probably the root of the problem. For fewer rows, PostgreSQL might choose a nested loop join which would not have to scan the whole table dns_cname.

That may be fixable or not. One thing that I can see immediately is that the estimate for the initial value '118.145.5.20' is too high by a factor 139.5, which is pretty bad. You might fix that by running ANALYZE on dns_cname, perhaps after increasing the statistics target for the column:

ALTER TABLE dns_a ALTER a SET STATISTICS 1000;

See if that makes a difference.

If that doesn't do the trick, you can manually set enable_mergejoin and enable_hashjoin to off and see if a plan with a nested loop join is really better or not. If you can get away with changing these parameters for this one statement only (probably with SET LOCAL) and get a better result that way, that is another option you have.

edited Mar 5, 2021 at 22:38

damon

15.2k16 gold badges61 silver badges79 bronze badges

answered Sep 12, 2016 at 9:37

Laurenz Albe

257k22 gold badges313 silver badges389 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

icuken Over a year ago

That's amazing, thank you, Laurenz. I clustered dns_a by a index, dns_cname by cname index, disabled mergejoin and got 45 ms on the same query or 0.5 ms with warm cache.

Collectives™ on Stack Overflow

PostgreSQL recursive CTE performance issue

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related