PostgreSQL Query Performance Fluctuates

Question

We have a system that loads data and then conducts data QC in PostgreSQL. The QC function's performance fluctuates drastically in one of our environments with no apparent pattern. I was able to track down the performance of the following simple query in the QC function:

WITH foo AS (SELECT full_address, jsonb_agg (gad_rec_id) gad_rec_ids
            FROM azgiv.v_full_addresses 
            WHERE gad_gly_id = 495
            GROUP BY full_address 
            HAVING count(1) > 1)
SELECT gad_nguid, gad_rec_id, foo.full_address
        FROM azgiv.v_full_addresses JOIN foo
            ON foo.full_address = v_full_addresses.full_address
        AND v_full_addresses.gad_gly_id = 495;

When I ran into slow-performance situation (Fig 2), I had to ANALYZE the table behind the view before the query plan changes to fast (Fig 1). The v_full_addresses is a simple view of a partitioned table with bunch of columns concatenated.

Here are two images of the query plans for the above query. I am newbie when comes to understanding query optimization and any help is greatly appreciated.

&

The images are hard to read and contain little information. You should show the EXPLAIN (ANALYZE, BUFFERS) as text. — jjanes
– jjanes, Commented Oct 20, 2020 at 14:53
It might be a typo or a simplification for StackOverflow but as far as I can see the result of the jsonb_agg() isn't used anywhere in the query further on; leaving it out might save you some CPU cycles. Also, if I may ask, what did you use to create those query plan diagrams? — deroby
– deroby, Commented Oct 22, 2020 at 8:55

Laurenz Albe · Accepted Answer · 2020-10-20 15:59:08Z

2

If performance improves after you ANALYZE a table, that means that the database's knowledge about the distribution of the data is outdated.

The best remedy is to tell PostgreSQL to collect these statistics more often:

ALTER TABLE some_table SET (autovacuum_analyze_scale_factor = 0.02);

0.02 is five times lower than the default 0.1, so statistics will be gathered five times more often.

If the bad query plans are generated right after a bulk load, you must choose a different strategy. In this case the problem is that it takes up to a minute for auto-analyze to kick in and calculate new statistics.

In that case you should run an explicit ANALYZE at the end of the bulk load.

edited Oct 20, 2020 at 15:59

answered Oct 20, 2020 at 6:32

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bo Guo Over a year ago

I set the 16 partitioned tables with the parameter which improved, but did not solve the consistency issue. So for the moment, in the load function, I added table ANALYZE right after the bulk data insertions. I feel it is not ideal, until I have more time to play with autovacuum parameters such as naptime and max_worker. Thanks!

Laurenz Albe Over a year ago

That is the elegant solution. See my updated answer.

Bo Guo Over a year ago

Great. Thank you so much, Laurenz!

Collectives™ on Stack Overflow

PostgreSQL Query Performance Fluctuates

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related