1

I have implemented an application which taking the data from csv file and inserting into Sql DB, i am using Linq to sql. also i have requirement of skiping those records whih are having some validation, to achive this i used a loop and inside the loop, calling submitchnages().

problem: this app is working for less number of records(<100) but in reality ill be getting csv file of more 3 - 4 lacs of record. I just simply ran my app against these big files, in result the app is taking long time (5 -6 hrs).

Please suggest any better approach.

0

3 Answers 3

3

Linq-to-SQL is great for getting data OUT of the database, or for validation and a small handful of inserts/updates at once. But for what you're doing (ETL), it sounds like you need to look into the SqlBulkCopy object. Go ahead and use your L2S objects to do the validation, but then instead of submitting the changes, just map the objects into a good old fashioned ADO.NET DataTable and the every 1000 records or so, bulk insert them.

Sign up to request clarification or add additional context in comments.

Comments

1

If performance is a big concern, LINQ to SQL might not be the tool for the job. However, before tossing LINQ to SQL out the door for your solution, you might consider the following:

  • Try creating a new DataContext after a certain number of records. The DataContext caches all entities so send to and retrieve from the database, which will lead to a large memory footprint and eventually... out of memory.
  • Use the SQL Profiler to see what queries LINQ to SQL sends to the database. Possibly, LINQ to SQL also queries the database for each entity you create.
  • Try to tune the database on inserts. This might be difficult, but you can try writing those records to an intermediate table (with less dependencies) and use a stored procedure to move the data to it's final destination.

Bulk inserts is something O/RMs are not good at, so you might need to take a different approach.

Comments

1

If you have to do the inserts using Linq2Sql, you may want to do intermittent commits. Something like this -

    public void LoadLargeDataUsingLinqToSql(string pathToCSV){
        DataTable dt = LoadMyCSVToDataTable(pathToCSV);
        int myPerformanceCounter = 0;
        foreach(DataRow dr in dt.Rows()){
            MyLinqClass m = ConvertDRToMyLinqClass(dr);
            if(m.IsValidAndReadyToBeSaved()){
                MyDataContext.MyLinqClassRef.InsertOnSubmit(m);
                myPerformanceCounter++;
            }

            if(myPerformaceCounter>25000){
                //Commit to clear cache.
                MyDataContext.SubmitChanges();
                myPerformanceCounter=0;
            }
        }
        //Commit leftovers
        MyDataContext.SubmitChanges();
    }

5 Comments

SubmitChanges does not clear DataContext's internal cache. It will hold on to all objects, even the submitted once. You should create a new DataContext after calling SubmitChanges().
By cache I didn't meant DataContext cache. I meant a logical cache of objects pending to be committed to database. Its not always feasible to create a new datacontext unless you have a very strict 'unit of work' implementation. SQLBulk copy is the preferred approach for typical data load program, but if you want to stick to pure object based model then this approach works fine. Inserting few hundred thousand objects in under a min.
Refer to blogs.msdn.com/b/dinesh.kulkarni/archive/2008/07/01/…. L2S caching is not a very costly cache.
"L2S caching is not a very costly cache". I agree with that. However, sooner or later the OP will experience out of memory exceptions, when he runs that operation for 5 to 6 hours with a single DataContext, because of the caching behavior.
@Steven - As a best practice I would agree we should recycle datacontext as soon as we are done with that, though I have a different experience on out of memory exception. I have a windows service that specifically loads 100's of feeds daily - for some specific reason we only recycle datacontext once every 24 hours. Never faced an out of memory situation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.