PostgreSQL: Read explain plan

Question

I'm running Postgres 12. I have an explain plan like this:

Sort  (cost=87.71..88.75 rows=418 width=40)
  Sort Key: language.name DESC
  ->  Hash Join  (cost=1.14..69.51 rows=418 width=40)
        Hash Cond: (film.language_id = language.language_id)
        ->  Seq Scan on film  (cost=0.00..66.50 rows=418 width=21)
              Filter: (rating = ANY ('{R,PG-13}'::mpaa_rating[]))
        ->  Hash  (cost=1.06..1.06 rows=6 width=25)
              ->  Seq Scan on language  (cost=0.00..1.06 rows=6 width=25)

As I understand, the nodes that have the startup cost equals to 0 will be the nodes that gets run first, so these 2 nodes runs in parallel:

Seq Scan on film (cost=0.00..66.50 rows=418 width=21)
Seq Scan on language (cost=0.00..1.06 rows=6 width=25)

Then this node gets run: Hash (cost=1.06..1.06 rows=6 width=25).

Then this node: Hash Join (cost=1.14..69.51 rows=418 width=40)

And finally: Sort (cost=87.71..88.75 rows=418 width=40)

However, according to this post: https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan the execution order should be:

Seq Scan on language (cost=0.00..1.06 rows=6 width=25)
Hash (cost=1.06..1.06 rows=6 width=25)
Seq Scan on film (cost=0.00..66.50 rows=418 width=21)
Hash Join (cost=1.14..69.51 rows=418 width=40)
Sort (cost=87.71..88.75 rows=418 width=40)

So which is the correct answer ? Do I misunderstand the execution order ?

Laurenz Albe · Accepted Answer · 2020-05-15 13:53:08Z

2

The order is as follows:

sequential scan on language
simultaneously, a hash table is built from the result (the hash table cannot be used until it is complete, so its startup cost is equal to the total cost of the scan)
sequential scan on film
simultaneously, the hash join is computed by probing the hash table
as soon as the join is done, sorting can start

answered May 15, 2020 at 13:53

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jjanes · Accepted Answer · 2020-05-15 16:53:24Z

As I understand, the nodes that have the startup cost equals to 0 will be the nodes that gets run first,

This is not correct. The startup cost is the estimated cost until it delivers its first row (not until it receives its first row). It includes the start up time of any of its child nodes, but not of any non-child nodes, even if it can't do any work until the non-child nodes does something else first.

so these 2 nodes runs in parallel:

Your plan does not show any parallel behavior. They are intercalated (my word from chemistry, I don't if there a CS jargon for it), but not parallel as in running on multiple CPU.

I think you are trying to read too much into it. The exact detail of the executor are only loosely tied to the exact details of the planner.

But for the executor, it goes something like this:

Start the seq can on films.
Once it finds a row which passes the filter, suspend this scan and go do the scan on language to completion, hashing the results
Assuming the hash table fit in memory, finish the hash join for the first row in films and if a match is found feed it to the sort, then finish the rest of the rows from films matching them to the hash table and feeding them to the sort as well.

Ever time a row is fed to sort, it has to deal with it. That could be as simple as inserting into a buffer area, or it could be sorting everything currently in the buffer area and writing the whole thing out to disk.

The reason for the weird little dance in the first two step is there is no reason to read and hash "language" if nothing from "films" is found. I don't think the planner acknowledges this dance at all.

Collectives™ on Stack Overflow

PostgreSQL: Read explain plan

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related