0

I am populating a MySQL table with a csv file pulled from a third party source. Every day the csv is updated and I want to update rows in MySQL table if an occurrence of column a, b and c already exists, else insert the row. I used load data infile for the initial load but I want to update against a daily csv pull. I am familiar with INSERT...ON DUPLICATE, but not in the context of a csv import. Any advice on how to nest LOAD DATA LOCAL INFILE within INSERT...ON DUPLICATE a, b, c - or if that is even the best approach would be greatly appreciated.

LOAD DATA LOCAL INFILE 'C:\\Users\\nick\\Desktop\\folder\\file.csv' 
INTO TABLE db.tbl
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n' 
IGNORE 1 lines;     

2 Answers 2

7

Since you use LOAD DATA LOCAL INFILE, it is equivalent to specifying IGNORE: i.e. duplicates would be skipped. But

If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row.

So you update-import could be

LOAD DATA LOCAL INFILE 'C:\\Users\\nick\\Desktop\\folder\\file.csv' 
REPLACE
INTO TABLE db.tbl
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n' 
IGNORE 1 lines;

https://dev.mysql.com/doc/refman/5.6/en/load-data.html

If you need a more complicated merge-logic, you could import CSV to a temp table and then issue INSERT ... SELECT ... ON DUPLICATE KEY UPDATE

Sign up to request clarification or add additional context in comments.

Comments

0

I found that the best way to do this is to insert the file with the standard LOAD DATA LOCAL INFILE

LOAD DATA LOCAL INFILE 
INTO TABLE db.table
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n' 
IGNORE 1 lines;

And use the following to delete duplicates. Note that the below command is comparing db.table to itself by defining it as both a and b.

delete a.* from db.table a, db.table b
where a.id > b.id
and a.field1 = b.field1
and a.field2 = b.field2
and a.field3 = b.field3; 

To use this method it is essential that the id field is an auto incremental primary key.The above command then deletes rows that contain duplication on field1 AND field2 AND field3. In this case it will delete the row with the higher of the two auto incremental ids, this works just as well if we were to use < instead of >.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.