SQL - relying on server errors during INSERT

Question

I'm working with PostgreSQL 9.1. Let's say I have a table where some columns have UNIQUE constraint. The easiest example:

CREATE TABLE test (
    value INTEGER NOT NULL UNIQUE
);

Now, when inserting some values, I have to separately handle the case, where the values to be inserted are already in the table. I have two options:

Make a SELECT beforehand to ensure the values are not in the table, or:
Execute the INSERT and watch for any errors the server might return.

The application utilizing the PostgreSQL database is written in Ruby. Here's how I would code the second option:

require 'pg'

db = PG.connect(...)

begin
    db.exec('INSERT INTO test VALUES (66)')
rescue PG::UniqueViolation
    # ... the values are already in the table
else
    # ... the values were brand new
end

db.close

Here's my thinking: let's suppose we make a SELECT first, before inserting. The SQL engine would have to scan the rows and return any matching tuples. If there are none, we make an INSERT, which presumably makes yet another scan, to see if the UNIQUE constraint is not about to be violated by any chance. So, in theory, second option would speed the execution up by 50%. Is this how PostgreSQL would actually behave?

We're assuming there's no ambiguity when it comes to the exception itself (e.g. we only have one UNIQUE constraint).

Is it a common practice? Or are there any caveats to it? Are there any more alternatives?

Tometzky · Accepted Answer · 2015-09-05 09:17:11Z

2

It depends - if your application UI normally allows entering duplicate values, then it's strongly encouraged to check before inserting. Because any error would invalidate current transaction, consume sequence/serial values, fill logs with error messages etc.

But if your UI is not allowing duplicates, and inserting duplicate is only possible when somebody is using tricks (for example during vulnerability research) or highly improbable then I'd allow inserting without checking first.

As unique constraint forces creation of an index, this check is not slow. But definitely slightly slower than inserting and checking for rare errors. Postgres 9.5 would have on conflict do nothing support, which would be both fast and safe. You'd check number of rows inserted to detect duplicates.

answered Sep 5, 2015 at 9:17

Tometzky

24.2k5 gold badges64 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

das-g Over a year ago

+1 for ON CONFLICT and checking number of inserted rows. Official documentation for ON CONFLICT is here.

Tomalla Over a year ago

The UI reflects the structure of the database itself - and since I declared UNIQUE constraints in the database, the UI doesn't allow the duplicates either. The suggested ON CONFLICT syntax and checking the number of inserted rows looks like a perfect SQL-only-ish alternative. Too bad I won't have any chance to put my hands on it though.

wildplasser · Accepted Answer · 2015-09-06 10:45:04Z

1

You don't (and shouldn't) have to test before; you can test while inserting. Just add the test as a where clause. The following insert inserts either zero or one tuple, dependiing on the existence of a row with the same value. (and it certainly is not slower) :

INSERT INTO test (value)
SELECT 55
WHERE NOT EXISTS (
    SELECT * FROM test
    WHERE value = 55
    );

Though your error-driven approach may look elegant from the client side, from the database side it is a near-disaster: the current transaction is rolled back implicitely + all cursors (including prepared statements) are closed. (thus: your application will have to rebuild the complete transaction but without the error and start again.)

Addition: when adding more than one row you can put the VALUES() into a CTE and refer to the CTE in the insert query:

WITH vvv(val) AS (
    VALUES (11),(22),(33),(44),(55),(66)
    )
INSERT INTO test(value)
SELECT val FROM vvv
WHERE NOT EXISTS (
    SELECT *
    FROM test nx
    WHERE nx.value = vvv.val
    );

-- SELECT * FROM test;

edited Sep 6, 2015 at 10:45

answered Sep 5, 2015 at 16:59

wildplasser

44.5k9 gold badges72 silver badges116 bronze badges

1 Comment

Tomalla Over a year ago

Thank you for the alternative SQL syntax. Although I didn't explicitly say it in the question, I'm also inserting multiple rows in a single INSERT statement, like so: INSERT INTO test VALUES (66), (67);. Could the statement you provided be modified to allow insertion of multiple rows? About the error-driven solution: once the SQL server returns an error, the client request is done being processed and it returns. It doesn't recover from the error, as the error itself renders the request incorrect. So this approach still stands tall, as it still seems faster and semantically correct.

Collectives™ on Stack Overflow

SQL - relying on server errors during INSERT

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related