44

I'm a little bit confused reading about PostgreSQL deadlocks.

A typical deadlock example is:

-- Transaction 1
UPDATE customer SET ... WHERE id = 1
UPDATE customer SET ... WHERE id = 2

-- Transaction 2
UPDATE customer SET ... WHERE id = 2
UPDATE customer SET ... WHERE id = 1

But what if I change the code as follows:

-- Transaction 1
UPDATE customer SET ... WHERE id IN (1, 2)

-- Transaction 2
UPDATE customer SET ... WHERE id IN (1, 2)

Will be a possibility of deadlock here?

Essentially my question is: in the 2nd case does PostgreSQL lock rows one-by-one, or lock the entire scope covered by the WHERE condition?

Thanks in advance!

1
  • Your "typical deadlock example" block actually helped me to create a DB Deadlock in Postgres in Spring. I tried several different things to create one and this one finally enabled me to create one. Just wanted to see a deadlock in action and was finally able to produce in some test code. Commented Dec 21, 2023 at 15:48

3 Answers 3

53

In PostgreSQL the rows will be locked as they are updated -- in fact, the way this actually works is that each tuple (version of a row) has a system field called xmin to indicate which transaction made that tuple current (by insert or update) and a system field called xmax to indicate which transaction expired that tuple (by update or delete). When you access data, it checks each tuple to determine whether it is visible to your transaction, by checking your active "snapshot" against these values.

If you are executing an UPDATE and a tuple which matches your search conditions has an xmin which would make it visible to your snapshot and an xmax of an active transaction, it blocks, waiting for that transaction to complete. If the transaction which first updated the tuple rolls back, your transaction wakes up and processes the row; if the first transaction commits, your transaction wakes up and takes action depending on the current transaction isolation level.

Obviously, a deadlock is the result of this happening to rows in different order. There is no row-level lock in RAM which can be obtained for all rows at the same time, but if rows are updated in the same order you can't have the circular locking. Unfortunately, the suggested IN(1, 2) syntax doesn't guarantee that. Different sessions may have different costing factors active, a background "analyze" task may change statistics for the table between the generation of one plan and the other, or it may be using a seqscan and be affected by the PostgreSQL optimization which causes a new seqscan to join one already in progress and "loop around" to reduce disk I/O.

If you do the updates one at a time in the same order, in application code or using a cursor, then you will have only simple blocking, not deadlocks. In general, though, relational databases are prone to serialization failures, and it is best to access them through a framework which will recognize them based on SQLSTATE and automatically retry the entire transaction from the start. In PostgreSQL a serialization failure will always have a SQLSTATE of 40001 or 40P01.

http://www.postgresql.org/docs/current/interactive/mvcc-intro.html

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! So my example above can cause a deadlock (because we don't know the order, in which the rows are processed in both transactions)?
It could cause a deadlock, although that would be rare; as opposed to the first example (explicitly choosing different orders), where it would be common. You could rule out deadlocks by taking a table-level lock of appropriate strength for the duration of every transaction which updates the table, but that cure may be worse than the disease. See the doc section I referenced for details.
But does PostgreSQL release lock after the row has been updated, but the whole UPDATE statement hasn't yet finished? In other words if we have a statement like UPDATE ... WHERE id IN (1,2,3,4,5) after postgresql updates, say, row with id=1 and proceeds with row with id=2, will it release the row id=1? if yes, how how will it roll the rows back if necessary?
Locks are held until commit or rollback.
7

does PostgreSQL lock rows one-by-one, or lock the entire scope

PostgreSQL locks rows one-by-one.

And frustratingly, there is no way to order updates (or deletes) like there is for selects and inserts.

The solution is to lock the records upfront with SELECT FOR UPDATE and self-join.

UPDATE customer AS c SET ...
FROM (
  SELECT ctid
  FROM customer
  WHERE id IN (1, 2)
  ORDER BY id -- the optimal ordering varies, but it must be strict and consistent
  FOR UPDATE
) AS c2
WHERE c.ctid = c2.ctid

(Here I use the physical ID of the row ctid which can be slightly faster for the join.)

PostgreSQL will find the records, lock the records in order, and then update the records.

You can examine the query plan to convince yourself of this.

There is some overhead, but it's minimal, especially considering that UPDATE is typically not a light operation in the first place.

2 Comments

🚨 CAUTION: It's tempting to use the ctid so Postgres can more quickly look up the relevant rows instead of needing to consult the index again, but that is not safe in the face of concurrent updates, at least with the default READ COMMITTED level. In particular, it's possible for the UPDATE to first fetch a stale copy of the rows according to the statement snapshot (one older than the FOR UPDATE result) and for the ctid JOIN to fail before it has a chance to recheck against the latest locked version of the rows: postgresql.org/docs/current/…
As best as I understand it, it would be safe if the SELECT FOR UPDATE and UPDATE queries are in separate statements within the same transaction.
1

Improve the SQL statement based on Paul Draper's solution:

UPDATE customer SET ...
WHERE id IN (
  SELECT id
  FROM customer
  WHERE id IN (1, 2)
  ORDER BY id
  FOR UPDATE
)

I did a test on PostgreSQL v10.1 with a shell script to loop the SQL statement 1000 times as background jobs. Where the original UPDATE command always run into the deadlock situation, both Paul Draper's command and mine were done successfully. There was no significant performance difference when I tested them on a small table with only two rows.

I don't have the knowledge of internal PostgreSQL implementation, which could affect the result if different versions have different optimization methods on the SQL commands.

Based on the Erwin Brandstetter's answer for a similar issue, this should work as well:

BEGIN;
SELECT id FROM customer WHERE id IN (1, 2)
  ORDER BY id
  FOR UPDATE;
UPDATE customer SET ... WHERE id IN (1, 2);
END;

4 Comments

It doesn't looks like improvement. AFAIK it's back again into the problem described in kgrittn answer.
@RabbanKeyak Thank for the comment. I have tested it successfully.
About the SELECT FOR UPDATE followed by UPDATE: Depending on the exact WHERE clause, the UPDATE might touch different rows than the SELECT!
@DavidBalažic Please check Erwin Brandstetter's answer from the original post dba.stackexchange.com/questions/257217

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.