UPDATE: Ended up using this method created by Johnny Bubriski and then modified it a bit to skip duplicates. Works like a charm and is apparently quite fast. Link: http://johnnycode.com/2013/08/19/using-c-sharp-sqlbulkcopy-to-import-csv-data-sql-server/
I have been searching for an answer to this but cannot seem to find it. I am doing a T-SQL bulk insert to load data into a table in a local database from a csv file. My statement looks like this:
BULK INSERT Orders
FROM 'csvfile.csv'
WITH(FIELDTERMINATOR = ';', ROWTERMINATOR = '0x0a', FORMATFILE = 'formatfile.fmt', ERRORFILE = 'C:\\ProgramData\\Tools_TextileMagazine\\AdditionalFiles\\BulkInsertErrors.txt')
GO
SELECT *
FROM Orders
GO
I get an exception when I try to insert duplicate rows (for example taking the same csv file twice) which causes the entire insert to stop and rollback. Understandable enough since I am violating the primary key constraint. Right now I just show a messagebox to let users know that duplicates are present in the csv file, but this is of course not a proper solution, actually not a solution at all. My question is, is there any way to ignore these duplicate rows and just skip them and only add the rows that are not duplicate? Perhaps in a try catch somehow?
If it is not possible, what would be the "correct" (for lack of a better word) way to import data from the csv file? The exception is causing me a bit of trouble. I did read somewhere that you can set up a temporary table, load the data into it and select distinct between the two tables before inserting. But is there really no easier way to do it with bulk insert?