Import CSV into SQL multiple tables

Question

I'm migrating data from one system to another and will be receiving a CSV file with the data to import. The file could contain up to a million records to import. I need to get each line in the file, validate it and put the data into the relevant tables. For example, the CSV would be like:

Mr,Bob,Smith,1 high street,London,ec1,012345789,work(this needs to be looked up in another table to get the ID)

There's a lot more data than this example in the real files.

So, the SQL would be something like this:

Declare @UserID
Insert into User
Values ('Mr', 'Bob', 'Smith', 0123456789)
Set @UserID = @@Identity
Insert into Address
Values ('1 high street', 'London', 'ec1', select ID from AddressType where AddressTypeName = 'work')

I was thinking of iterating over each row and call an SP with the parameters from the file which will contain the SQL above. Would this be the best way of tackling this? It's not time critical as this will just be run once when updating a site.

I'm using C# and SQL Server 2008 R2.

For importing CSV into MS SQL, why don't you just use the import wizard (see HOWTO here). — smocking
– smocking, Commented Jan 9, 2013 at 21:46
I would follow TomTom's advice but use BULK INSERT which you can automate vs. the import wizard which will be a challenge to do so... — Aaron Bertrand
– Aaron Bertrand, Commented Jan 9, 2013 at 21:52
Note that oftentimes when you can't import the file directly it can be easier to have a batch job that just transforms the file into another file that can be uploaded directly to the DB through whatever built in importing mechanisms it supports. — Servy
– Servy, Commented Jan 9, 2013 at 21:53
@AaronBertrand The whole point of the wizard is that you don't have to automate it for a one time upload. — Servy
– Servy, Commented Jan 9, 2013 at 21:53
And still, a BULK INSERT command could be just as quick to write as stepping through all the steps in that heinous wizard... — Aaron Bertrand
– Aaron Bertrand, Commented Jan 9, 2013 at 21:56

TomTom · Accepted Answer · 2013-01-09 21:50:57Z

4

What about you load it into a temporary table (note that this may be logically temporary - not necessarily technically) as staging, then process it from there. This is standard ETL behavior (and a million is tiny for ETL), you first stage the data, then clean it, then put it to the final place.

answered Jan 9, 2013 at 21:50

TomTom

62.4k11 gold badges95 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

knappster Over a year ago

That sounds good, I'll look into doing this. So basically I create a temp table, bulk load the data into it, I can then validate the data in my program and use SP's to move the data to where it needs to go to...

knappster Over a year ago

If I do bulk insert into a temp table, is there anyway to process the data without using a cursor? So in my example above, I would bulk insert the CSV data into the temp table, then I would need to process each row in the temp table to first insert the user and get the ID and use that to insert the rest of the data into the relevant table...

TomTom Over a year ago

Sure. 99% of all SQL can happen without a cursor - it is only people that never learn SQL properly that tend to use cursors for thing that are wonderfull example of set oriented logic.

knappster Over a year ago

My SQL is only average, I can't see how you do it? If I have a temp table with the data above, how would you insert the user first, get the identity value, then use that to insert the rest of the data, for all of the rows in the temp table without using cursors? Any pointers on what to use?

TomTom Over a year ago

2 statements. First merge into target table, then merge into second. Look up the merge statement.

|

HLGEM · Accepted Answer · 2013-01-09 22:29:10Z

0

When performing tasks of this nature, you do not think in terms of rotating through each record individually as that will be a huge performence problem. In this case you bulk insert the records to a staging table or use the wizard to import to a staging table (look out for teh deafult 50 characters espcially in the address field).Then you write set-based code to do any clean up you need (removing bad telephone numbers or zip code or email addresses or states or records missing data in fields that are required in your database or transforing data using lookup tables (suppose you have table with certain required values, those are likely not the same values that you wil find in this file, you need to convert them. We use doctor specialties a lot. So our system might store them as GP but the file might give us a value of General Practioner. You need to look at all teh non-matching values for the field and then determine if you can map them to existing values, if you need to throw the record out or if you need to add more values to your lookup table. Once you have gotten rid of records you don't want and cleaned up those you can in your staging table then you import to the prod tables. Inserts should be written using the SELECT version of INSERT not with the VALUES clause when you are writing more than one or two records.

answered Jan 9, 2013 at 22:29

HLGEM

97k15 gold badges120 silver badges191 bronze badges

1 Comment

TomTom Over a year ago

Actually it does work. IN my last ETL project we handled half a million "master" rows with up to 120.000 details EACH by going through the masters one by one then loading the details. 20 minute processing. But i have to agree - our hardware needed for that was quite impressive.

Collectives™ on Stack Overflow

Import CSV into SQL multiple tables

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related