1

I have a lot of records I need to insert into an already massive table (the table has several million rows and I'm inserting several thousand rows at a time).

The current syntax being used is a simple one:

INSERT INTO my_table (name, email) VALUES ('test','[email protected]'),('test2','[email protected]')

These requests are made in batches of 1 thousand rows at a time.

For reasons outside of my control I do not have the ability to declare email as a unique column (the table already has duplicates).

How can I add a conditional to each INSERT request where we skip the inserts of an email when that value is found in the database?

So let's say in the above request we find we already have a row with email [email protected]. I would want to skip that row but still insert the other row ([email protected]).

I'm struggling to find a syntax that works.

Using SQL Server and pydb.

Thanks in advance for any help! :D

0

1 Answer 1

1

This query should work

INSERT INTO my_table (name, email) 
SELECT name, email
FROM (VALUES ('test','[email protected]'),('test2','[email protected]')) AS Vals(name, email)
WHERE NOT EXISTS (SELECT 1 FROM my_table x WHERE x.email = Vals.email)

An index on the email column will make this as fast as it can be, but it will still be slower than a normal insert.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.