0

I have a simpe query like so:

 INSERT INTO myTable (col1, col2) VALUES 
  (1,2),
  (1,3),
  (2,2)

I need to do a check that no duplicate values have been added BUT the check needs to happen across both column: if a value exists in col1 AND col2 then I don't want to insert. If the value exists only in one of those columns but not both then then insert should go through..

In other words let's say we have the following table:

 +-------------------------+
 |____col1____|___col2_____|
 |      1     |     2      |
 |      1     |     3      |
 |______2_____|_____2______|

Inserting values like (2,3) and (1,1) would be allowed, but (1,3) would not be allowed.

Is it possible to do a WHERE NOT EXISTS check a single time? I may need to insert 1000 values at one time and I'm not sure whether doing a WHERE check on every single insert row would be efficient.

EDIT: To add to the question - if there's a duplicate value across both columns, I'd like the query to ignore this specific row and continue onto inserting other values rather than throwing an error.

1 Answer 1

2

What you might want to use is either a primary key or a unique index across those columns. Afterwards, you can use either replace into or just insert ignore:

create table myTable
(
    a int,
    b int,
    primary key (a,b)
);

-- Variant 1
replace into myTable(a,b) values (1, 2);

-- Variant 2
insert ignore into myTable(a,b) values (1,2);

See Insert Ignore and Replace Into

Using the latter variant has the advantage that you don't change any record if it already exists (thus no need to rebuild any index) and would best match your needs regarding your question.

If, however, there are other columns that need to be updated when inserting a record violating a unique constraint, you can either use replace into or insert into ... on duplicate key update.

Replace into will perform a real deletion prior to inserting a new record, whereas insert into ... on duplicate key update will perform an update instead. Although one might think that the result will be same, so why is there a statement for both operations, the answer can be found in the side-effects:

Replace into will delete the old record before inserting the new one. This causes the index to be updated twice, delete and insert triggers get executed (if defined) and, most important, if you have a foreign key constraint (with on delete restrict or on delete cascade) defined, your constraint will behave exactly the same way as if you deleted the record manually and inserted the new version later on. This means: Either your operation fails because the restriction is in place or the delete operation gets cascaded to the target table (i.e. deleting related records there, although you just changed some column data).

On the other hand, when using on duplicate key update, update triggers will get fired, the indexes on changed columns will be rewritten once and, if a foreign key is defined on update cascade for one of the columns being changed, this operation is performed as well.

To answer your question in the comments, as stated in the manual:

If you use the IGNORE modifier, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.

So, all violations are treated as warnings rather than errors, causing the insert to complete. Otherwise, the insert would be applied partially (except when using transactions). Violations of duplicate key, however, do not even produce such a warning. Nonetheless, all records violating any constraint won't get inserted at all, but ignore will ensure all valid records get inserted (given that there is no system failure or out-of-memory condition).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for that. Just one question - if I use Insert Ignore, will it be a global ignore? ie - will it only supress errors if there's a duplicate entry on those two columns? What if there's an invalid value in the query - will the error be supressed? I would like the query to only supress errors for duplicate values on those two columns, but would still like it to throw other errors if they occur.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.