1

I'm working with PostgreSQL 9.1. Let's say I have a table where some columns have UNIQUE constraint. The easiest example:

CREATE TABLE test (
    value INTEGER NOT NULL UNIQUE
);

Now, when inserting some values, I have to separately handle the case, where the values to be inserted are already in the table. I have two options:

  • Make a SELECT beforehand to ensure the values are not in the table, or:
  • Execute the INSERT and watch for any errors the server might return.

The application utilizing the PostgreSQL database is written in Ruby. Here's how I would code the second option:

require 'pg'

db = PG.connect(...)

begin
    db.exec('INSERT INTO test VALUES (66)')
rescue PG::UniqueViolation
    # ... the values are already in the table
else
    # ... the values were brand new
end

db.close

Here's my thinking: let's suppose we make a SELECT first, before inserting. The SQL engine would have to scan the rows and return any matching tuples. If there are none, we make an INSERT, which presumably makes yet another scan, to see if the UNIQUE constraint is not about to be violated by any chance. So, in theory, second option would speed the execution up by 50%. Is this how PostgreSQL would actually behave?

We're assuming there's no ambiguity when it comes to the exception itself (e.g. we only have one UNIQUE constraint).

Is it a common practice? Or are there any caveats to it? Are there any more alternatives?

2 Answers 2

2

It depends - if your application UI normally allows entering duplicate values, then it's strongly encouraged to check before inserting. Because any error would invalidate current transaction, consume sequence/serial values, fill logs with error messages etc.

But if your UI is not allowing duplicates, and inserting duplicate is only possible when somebody is using tricks (for example during vulnerability research) or highly improbable then I'd allow inserting without checking first.

As unique constraint forces creation of an index, this check is not slow. But definitely slightly slower than inserting and checking for rare errors. Postgres 9.5 would have on conflict do nothing support, which would be both fast and safe. You'd check number of rows inserted to detect duplicates.

Sign up to request clarification or add additional context in comments.

2 Comments

+1 for ON CONFLICT and checking number of inserted rows. Official documentation for ON CONFLICT is here.
The UI reflects the structure of the database itself - and since I declared UNIQUE constraints in the database, the UI doesn't allow the duplicates either. The suggested ON CONFLICT syntax and checking the number of inserted rows looks like a perfect SQL-only-ish alternative. Too bad I won't have any chance to put my hands on it though.
1

You don't (and shouldn't) have to test before; you can test while inserting. Just add the test as a where clause. The following insert inserts either zero or one tuple, dependiing on the existence of a row with the same value. (and it certainly is not slower) :

INSERT INTO test (value)
SELECT 55
WHERE NOT EXISTS (
    SELECT * FROM test
    WHERE value = 55
    );

Though your error-driven approach may look elegant from the client side, from the database side it is a near-disaster: the current transaction is rolled back implicitely + all cursors (including prepared statements) are closed. (thus: your application will have to rebuild the complete transaction but without the error and start again.)


Addition: when adding more than one row you can put the VALUES() into a CTE and refer to the CTE in the insert query:

WITH vvv(val) AS (
    VALUES (11),(22),(33),(44),(55),(66)
    )
INSERT INTO test(value)
SELECT val FROM vvv
WHERE NOT EXISTS (
    SELECT *
    FROM test nx
    WHERE nx.value = vvv.val
    );

-- SELECT * FROM test;

1 Comment

Thank you for the alternative SQL syntax. Although I didn't explicitly say it in the question, I'm also inserting multiple rows in a single INSERT statement, like so: INSERT INTO test VALUES (66), (67);. Could the statement you provided be modified to allow insertion of multiple rows? About the error-driven solution: once the SQL server returns an error, the client request is done being processed and it returns. It doesn't recover from the error, as the error itself renders the request incorrect. So this approach still stands tall, as it still seems faster and semantically correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.