I am trying to import data into an empty SQL server table, avoiding duplicates that exist in the source data.
Currently I am doing a bulk insert into a temp table, and then copying the data across using:
INSERT INTO Actual_table
SELECT * FROM Temp_table
So the Temp_table and Actual_table have the exact same structure, the only difference is that on the PK field on the Actual_table, I have set up the Temp_table with a UNIQUE identifier, and set it to ignore duplicates:
UNIQUE NONCLUSTERED (Col1) WITH (IGNORE_DUP_KEY = ON)
In other words:
Actual_table
Col1 (PK) Col2
Temp_table
Col1 (Unique, ignore duplicates) Col2
The Actual_table is empty when we start this process, and the duplicates to be avoided are only on the PK field (not DISTINCT on the whole row, in other words).
I have no idea if this is the best way to achieve this, and comments/suggestions would be appreciated.
Just to flesh out my thoughts further:
- Should I rather import straight to the actual table, adding the IGNORE_DUP_KEY contraint before importing, and then removing it (is this even possible)?
- Do I not set up the Temp_table with the IGNORE_DUP_KEY constraint (which makes the bulk import faster), and then tweak the copying across code to ignore the duplicates? If this is a good idea, could someone please show me the syntax to achieve this.
I am using SQL server 2014.