2

Please see the following situation:

I do have a CSV files of which I import a couple of fields (not all in SQL server using Entity Framework with the Unit Of Work and Repository Design Pattern).

var newGenericArticle = new GenericArticle
{
    GlnCode = data[2],
    Description = data[5],
    VendorId = data[4],
    ItemNumber = data[1],
    ItemUOM = data[3],
    VendorName = data[12]
};

var unitOfWork = new UnitOfWork(new AppServerContext());
unitOfWork.GenericArticlesRepository.Insert(newGenericArticle);

unitOfWork.Commit();

Now, the only way to uniquely identify a record, is checking on 4 fields: GlnCode, Description, VendorID and Item Number.

So, before I can insert a record, I need to check whether or not is exists:

 var unitOfWork = new UnitOfWork(new AppServerContext());

 // If the article is already existing, update the vendor name.
 if (unitOfWork.GenericArticlesRepository.GetAllByFilter(
         x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
              x.Description.Equals(newGenericArticle.Description) &&
              x.VendorId.Equals(newGenericArticle.VendorId) &&
              x.ItemNumber.Equals(newGenericArticle.ItemNumber)).Any())
 {
     var foundArticle = unitOfWork.GenericArticlesRepository.GetByFilter(
         x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
              x.Description.Equals(newGenericArticle.Description) &&
              x.VendorId.Equals(newGenericArticle.VendorId) &&
              x.ItemNumber.Equals(newGenericArticle.ItemNumber));

     foundArticle.VendorName = newGenericArticle.VendorName;

     unitOfWork.GenericArticlesRepository.Update(foundArticle);
 }

If it's existing, I need to update it, which you see in the code above.

Now, you need to know that I'm importing around 1.500.000 records, so quite a lot. And it's the filter which causes the CPU to reach almost 100%.

The `GetAllByFilter' method is quite simple and does the following:

return !Entities.Any() ? null : !Entities.Where(predicate).Any() ? null : Entities.Where(predicate).AsQueryable();

Where predicate equals Expression<Func<TEntity, bool>>

Is there anything that I can do to make sure that the server's CPU doesn't reach 100%?

Note: I'm using SQL Server 2012

Kind regards

5
  • I would suggest to use a stored procedure, or try a bulk insert extension Commented Apr 14, 2015 at 14:04
  • Why are you so .Any() happy? You are literally querying the database 10 times for every insert. Now, granted, that Any tends to use an EXISTS query, but it's still a query. In particular, you call Entities.Any(), then Any on the predicate, then return an iqueryable and then call Any on that again! Sheesh. Commented Apr 15, 2015 at 17:40
  • But beyond that, EF is just not designed for this.. It's not a batch or bulk job processor... Use SqlBulkCopy class instead. Commented Apr 15, 2015 at 17:43
  • @ErikFunkenbusch Any suggestion on how to get rid of the Any() implementation to make it more performant? Commented Apr 16, 2015 at 6:34
  • Yes, just do a single query with a where clause and your four conditions with a SingleOrDefault (assuming it can only return a single record), and if it's null it means it doesn't exist, so skip the update. Commented Apr 16, 2015 at 6:42

3 Answers 3

2

Wrong tool for the task. You should never process a million+ records one at at time. Insert the records to a staging table using bulk insert and clean (if need be) and then use a stored proc to do the processing in a set-based way or use the tool designed for this, SSIS.

Sign up to request clarification or add additional context in comments.

Comments

1

I've found another solution which wasn't proposed here, so I'll be answering my own question.

I will have a temp table in which I will import all the data, and after the import, I'll execute a stored procedure which will execute a Merge command to populate the destinatio table. I do believe that this is the most performant.

Comments

0

Have you indexed on those four fields in your database? That is the first thing that I would do.

Ok, I would recommend trying the following: Improving bulk insert performance in Entity framework

To summarize, Do not call SaveChanges() after every insert or update. Instead, call every 1-2k records so that the inserts/updates are made in batches to the database.

Also, optionally change the following parameters on your context:

yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

11 Comments

Not done it yet. Will do right away. Didn't understand on how I missed that. Any other ideas that you have?
The fields are index right now, but the problem remains the same. I do have a table that has 4 keys (the 4 columns that makes a record unique). After that, I've created an index for those 4 keys, but the problem still remains.
In that case you might want to use a stored procedure with a MERGE statement. You can call stored procedures from entity framework
I will give that a try, but is that such a performance boost as I still need to execute the SP for every record in the file to import. Meaning, 1.500.000 calls to the Stored Procedure?
No, you can use a table value parameter and send it in batches. mikesdotnet.wordpress.com/2013/03/17/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.