Postgres SQL sentence performance

Question

I´ve a Postgres instance running in a 16 cores/32 Gb WIndows Server workstation.

I followed performance improvements tips I saw in places like this: https://www.postgresql.org/docs/9.3/static/performance-tips.html.

When I run an update like:

analyze;
update amazon_v2 
  set states_id = amazon.states_id, 
  geom = amazon.geom
from amazon
where amazon_v2.fid = amazon.fid

where fid is the primary key in both tables and both has 68M records, it takes almost a day to run.

Is there any way to improve the performance of SQL sentences like this? Should I write a stored procedure to process it record by record, for example?

Are you sure all rows need updating? You could add and (amazon_v2.states_id <> amazon.states_id or amazon_v2.geom <> amazon.geom) to reduce the number of rows that need to be changed — user330315
– user330315, Commented Jun 5, 2018 at 20:35
do you have an index on states_id field? If a lots of rows in amazon_v2 are getting updated with a different value for states_id you might want to drop the index(if present) on states_id and then rebuild it after your update — Peeyush
– Peeyush, Commented Jun 5, 2018 at 21:02
The problem is that I exported some columns of table amazon table to csv to submit to h2o. Now I need to return the results to the database in a new version of amazon table, v2. The geom and states_id columns it´s not necessary in h2o, but I need those in amazon_v2. This update is taking so much time that I think I will export geom and states_id, process data and import all columns back using /copy. I suppose it will be faster. I think if I could "force" postgres to use more resources it could be nice. — Mauro Assis
– Mauro Assis, Commented Jun 5, 2018 at 21:08
Please edit your question and add the create table statements for the tables in question (including all indexes), the query you are using and the execution plan generated using explain (verbose). Formatted text please, no screen shots — user330315
– user330315, Commented Jun 6, 2018 at 7:38

The Impaler · Accepted Answer · 2018-06-05 20:27:58Z

2

You don't show the execution plan but I bet it's probably performing a Full Table Scan on amazon_v2 and using an Index Seek on amazon.

I don't see how to improve performance here, since it's close to optimal already. The only think I can think of is to use table partitioning and parallelizing the execution.

Another totally different strategy, is to update the "modified" rows only. Maybe you can track those to avoid updating all 68 million rows every time.

answered Jun 5, 2018 at 20:27

The Impaler

49.3k10 gold badges50 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user103153 · Accepted Answer · 2018-06-06 09:01:14Z

1

Your query is executed in a very log transaction. The transaction may be blocked by other writers. Query pg_locks.

Long transactions have negative impact on performance of autovacuum. Does execution time increase other time? If,so check table bloat.

Performance usually increases when big transactions are dived into smaller. Unfortunately, the operation is no longer atomic and there is no golden rule on optimal batch size.

You should also follow advice from https://stackoverflow.com/a/50708451/6702373

Let's sum it up:

Update modified rows only (if only a few rows are modified)
Check locks
Check table bloat
Check hardware utilization (related to other issues)
Split the operation into batches.
Replace updates with delete/truncate & insert/copy (this works if the update changes most rows).
(if nothing else helps) Partition table

edited Jun 6, 2018 at 9:01

answered Jun 6, 2018 at 7:23

user103153

363 bronze badges

2 Comments

user330315 Over a year ago

The suggestion that "smaller transactions" are faster is not true. 65 million transactions updating one row each will be slower (in total) than one transaction updating 65 million rows.

user103153 Over a year ago

@a_horse_with_no_name: One very big long transaction might or might not be faster than 65 million transaction updating one row. Everything depends on exact usage pattern of the system. Long running transactions often decrease overall performance of a database. I edited my answer and replaced "smaller transactions" with "batches" and rewrote related parts.

Collectives™ on Stack Overflow

Postgres SQL sentence performance

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related