2

I´ve a Postgres instance running in a 16 cores/32 Gb WIndows Server workstation.

I followed performance improvements tips I saw in places like this: https://www.postgresql.org/docs/9.3/static/performance-tips.html.

When I run an update like:

analyze;
update amazon_v2 
  set states_id = amazon.states_id, 
  geom = amazon.geom
from amazon
where amazon_v2.fid = amazon.fid

where fid is the primary key in both tables and both has 68M records, it takes almost a day to run.

Is there any way to improve the performance of SQL sentences like this? Should I write a stored procedure to process it record by record, for example?

8
  • 1
    Do you really run analyze before update? Commented Jun 5, 2018 at 20:21
  • 1
    Are you sure all rows need updating? You could add and (amazon_v2.states_id <> amazon.states_id or amazon_v2.geom <> amazon.geom) to reduce the number of rows that need to be changed Commented Jun 5, 2018 at 20:35
  • do you have an index on states_id field? If a lots of rows in amazon_v2 are getting updated with a different value for states_id you might want to drop the index(if present) on states_id and then rebuild it after your update Commented Jun 5, 2018 at 21:02
  • The problem is that I exported some columns of table amazon table to csv to submit to h2o. Now I need to return the results to the database in a new version of amazon table, v2. The geom and states_id columns it´s not necessary in h2o, but I need those in amazon_v2. This update is taking so much time that I think I will export geom and states_id, process data and import all columns back using /copy. I suppose it will be faster. I think if I could "force" postgres to use more resources it could be nice. Commented Jun 5, 2018 at 21:08
  • 1
    Please edit your question and add the create table statements for the tables in question (including all indexes), the query you are using and the execution plan generated using explain (verbose). Formatted text please, no screen shots Commented Jun 6, 2018 at 7:38

2 Answers 2

2

You don't show the execution plan but I bet it's probably performing a Full Table Scan on amazon_v2 and using an Index Seek on amazon.

I don't see how to improve performance here, since it's close to optimal already. The only think I can think of is to use table partitioning and parallelizing the execution.

Another totally different strategy, is to update the "modified" rows only. Maybe you can track those to avoid updating all 68 million rows every time.

Sign up to request clarification or add additional context in comments.

Comments

1

Your query is executed in a very log transaction. The transaction may be blocked by other writers. Query pg_locks.

Long transactions have negative impact on performance of autovacuum. Does execution time increase other time? If,so check table bloat.

Performance usually increases when big transactions are dived into smaller. Unfortunately, the operation is no longer atomic and there is no golden rule on optimal batch size.

You should also follow advice from https://stackoverflow.com/a/50708451/6702373

Let's sum it up:

  • Update modified rows only (if only a few rows are modified)

  • Check locks

  • Check table bloat

  • Check hardware utilization (related to other issues)

  • Split the operation into batches.

  • Replace updates with delete/truncate & insert/copy (this works if the update changes most rows).

  • (if nothing else helps) Partition table

2 Comments

The suggestion that "smaller transactions" are faster is not true. 65 million transactions updating one row each will be slower (in total) than one transaction updating 65 million rows.
@a_horse_with_no_name: One very big long transaction might or might not be faster than 65 million transaction updating one row. Everything depends on exact usage pattern of the system. Long running transactions often decrease overall performance of a database. I edited my answer and replaced "smaller transactions" with "batches" and rewrote related parts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.