3

I have to apply simple formatting to 10,000+ Excel files. I already have a multicore program working. It opens a new Excel instance for every Excel file. If it matters, this code is currently inside of my Form code.

I was hoping to use one Excel instance with many workbooks. Is it possible to make use of multicore capabilities if I only use one instance of Excel? How?

-If the answer to the above is no, a more complicated question might be: How many instances of Excel should I spawn, and how can I split the workbooks between each instance?

Current code here:

private void SelectFilesButtonClick(object sender, EventArgs e)
{
   var listOfExcelFiles = OpenExcel.FileNames.ToList();
   Parallel.ForEach(listOfExcelFiles, TrivialExcelEditFunction);
}

private void TrivialExcelEditFunction(string file)
{
   //Open instance of Excel
   //Do processing
   //Close instance of Excel
}

Updated code below, but still doesn't limit itself to the proper number of cores. Not sure why.

private void SelectFilesButtonClick(object sender, EventArgs e)
{
   var listOfExcelFiles = OpenExcel.FileNames.ToList();
   int cores = Environment.ProcessorCount;

   //Split one list into list of lists. Number of lists based on number of cpu cores
   List<List<object>> listOfLists = Split(listOfExcelFiles, cores);

   //Limits number of threads to number of cores
   Parallel.ForEach(listOfLists, new ParallelOptions { MaxDegreeOfParallelism = cores }, EditExcel);
}

private void TrivialExcelEditFunction(string file)
{
   //Open instance of Excel

   foreach (string file in files)
   {
       //Do processing
   }
   //Close instance of Excel
}

Lets say I have 4 cores. My thinking was to divide list of files into 4 equal lists, limit threads to 4, then I'd be able to process files in 4 instances of Excel. I thought that would mean TrivialExcelEditFunction would only run 4 times. Instead this function is being run anywhere between 14 and 27 times. Please tell me where I went wrong.

3 Answers 3

2

Create N tasks/threads to do the processing where "N" is the number of cores on your machine. Give each task/thread one instance of Excel.

You won't be able to control a single instance from multiple threads (at least not productively; it will only be working on one thread's task at a time), and creating so many instances of Excel is quite inefficient.

Sign up to request clarification or add additional context in comments.

4 Comments

Should I be making one instance per physical core or per logical core?
@Brandon Most likely logical, but in practice just try both and see which is faster.
Updated question with new code. Still no luck. Any chance you can provide a bit more assistance?
@Brandon Just scrape the code you're using to fire off the new instances. Create N threads/tasks, in each one create an instance of excel. Have that thread process 1/N files. Wait on all of the threads/tasks. Don't use Parallel.Foreach.
2

As @Servy pointed out you can create multiple threads with one Excel instance per thread.

Make sure each thread creates the the Excel object it's using.

Expect weirdness. With 10,000+ files you can probably expect minor issues in at least a few. If a hidden Excel instance tries to prompt the user it may appear frozen.

Word has some issues when using some functions that call into single-threaded COM objects, I suspect Excel may have some of those too. In Word these manifest in various ways, including the instance freezing or being closed.

If the formatting is really simple and your files are xmlx it may be feasible writing some code to apply the changes through OOXML SDK, which wouldn't require an actual Excel instance

2 Comments

Seems like the most logical way to handle errors when batch processing is to store which files encountered an error and notify the user which files they would need to do manually.
@Brandon that would be the best approach, yes, except you will need a watchdog thread to detect if some of your worker threads are frozen. And when you dismiss a COM object inappropriately you will get significant memory leaks
1

I think this is what @Servy is referring to above. I use this daily at the moment in a piece of code I am working with which does handle Excel and most certainly hasn't failed yet. Also make sure that you marshal you COM object correctly. A little extra info as well, Excel 2010 uses multi-core internally, watch out for performance (with large file especially).

var tasks = new Task[Environment.ProcessorCount];

for (int i = 0; i< Environment.ProcessorCount; i++)
{
    tasks [i] = Task.Factory.StartNew(() =>
    {
        // your Excel code here.                 
    });
}

Task.WaitAll(tasks);

1 Comment

Worked perfect. Without the overhead of having to repeatedly open and close Excel, I brought the processing time down from 1 file every 3 seconds to 5 files per second.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.