3

I have to run a once off C# calculation on millions of rows of data and save the results in another table. I haven't worked with threading in C# in a couple of years. I'm using .NET v4.5 and EF v5.

The original code is something along the lines of:

public static void Main()
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    Entities db = new Entities();
    DoCalc(db.Clients.ToList());
    sw.Stop();
    Console.WriteLine(sw.Elapsed);
}

private static void DoCalc(List<Client> clients)
{
Entities db = new Entities();    
    foreach(var c in clients)
    {
       var transactions = db.GetTransactions(c);
       var result = calulate(transactions); //the actual calc
       db.Results.Add(result);
       db.SaveChanges();
    }    
}

Here is my attempt at multi-threading:

private static int numberOfThreads = 15;

public static void Main()
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    Entities db = new Entities();

    var splitUpClients = SplitUpClients(db.Clients());

    Task[] allTasks = new Task[numberOfThreads];

    for (int i = 0; i < numberOfThreads; i++)
    {               
        Task task = Task.Factory.StartNew(() => DoCalc(splitupClients[i]));
        allTasks[i] = task;             
     }  

    Task.WaitAll(allTasks);             
    sw.Stop();
    Console.WriteLine(sw.Elapsed);
}

private static void DoCalc(List<Client> clients)
{
Entities db = new Entities();    
    foreach(var c in clients)
    {
       var transactions = db.GetTransactions(c);
       var result = calulate(transactions);
       db.Results.Add(result);
       db.SaveChanges();
    }    
}

//splits the list of clients into n subgroups
private static List<List<Client>> SplitUpClients(List<Client> clients)
{
    int maxPerGroup = (int)Math.Ceiling((double)clients.Count() / numberOfThreads);

    return ts.Select((s, i) => new { Str = s, Index = i }).
                        GroupBy(o => o.Index / maxPerGroup, o => o.Str).
                        Select(coll => coll.ToList()).
                        ToList();           
}

My question is:

Is this the safe and correct way to do it and are there any obvious shortcomings (especially with regard to EF)?

Also, how do I find the optimum number of threads? Is it the more the merrier?

1
  • 2
    Use using, ie using (Entities db = new Entities()) { ... } especially when you create them on a thread. Commented Aug 28, 2013 at 10:39

2 Answers 2

7

The entity framework DbContext and ObjectContext classes are NOT thread-safe. So you should not use them over multiple threads.

Although it seems like you're only passing entities to other threads, it's easy to go wrong at this, when lazy loading is involved. This means that under the covers the entity will callback to the context to get some more data.

So instead, I would advice to convert the list of entities to a list of special immutable data structures that only need the data that is needed for the calculation. Those immutable structures should not have to call back into the context and should not be able to change. When you do this, it will be safe to pass them on to other threads to do the calculation.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the answer: so in the example above I should create a DTO class to represent a client and pass a list of those to the DoCalc method? Is it ok to create a new Entity instance in each thread?
Yes, an immutable DTO containing all the data (but nothing more) that is needed for the calculation. If it represents a client, you should probably call it something like ClientCalculationData. Creating new entities on a thread will be fine, as long as you don't interact with the object context, but perhaps it's cleaner to let the calculation spit out immutable structures that you translate back to the entities that you wish to insert on the main thread.
2

Aside from the problems with Entity Framework that Steven has addressed.

Regarding numberOfThreads:

There is no need to do this self throttling. Go nuts, and let the ThreadPool do it's job which is to maintain a queue of tasks for you and decide on the number of concurrent threads. You need not SplitUpClients or do a foreach in the DoCalc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.