2

This may be a tricky question to ask, but what I have is a DataTable that contains 1000 rows. Foreach of these rows I want to process on a new thread. However I want to limit the threads to 4 threads. So basically I'm constently keeping 4 threads running until the whole datatable has been processed.

currently I have this;

  foreach (DataRow dtRow in urlTable.Rows)
        {
            for (int i = 0; i < 4; i++)
            {
                Thread thread = new Thread(() => MasterCrawlerClass.MasterCrawlBegin(dtRow));
                thread.Start();
            }
        }

I know this is backwards but i'm not sure how to achieve what I'm looking for. I thought of a very complicated while loop but maybe that's not the best way? Any help is always appreciated.

1
  • 1
    Your code example starts 4 thread per row, i do not believe that this is intensionally. Commented Apr 8, 2012 at 18:33

1 Answer 1

7

Simplest solution would be in case you have 4 CPU cores - Parallel LINQ +Degree of parallelism == 4 would give you one threads per CPU core, otherwise you have manually distribute records between threads/tasks, see both solutions below:

PLINQ solution:

urlTable.Rows.AsParallel().WithDegreeOfParallelism(4)
             .Select(....)

Manual distribution:

You can distribute items by worker threads manually using simple trick: N-thread would pick up each N+4 item from the input list, for instance:

  • First thread: Each0+4 == 0, 3, 7...
  • Second: Each1+4 == 1, 4, 8...
  • Third: Each2+4 == ...

Task Parallel Library solution:

private void ProcessItems(IEnumerable<string> items)
{
     // TODO: ..
}

var items = new List<string>(Enumerable.Range(0, 1000)
                                       .Select(i => i + "_ITEM"));
var items1 = items.Where((item, index) => (index + 0) % 4 == 0);
var items2 = items.Where((item, index) => (index + 1) % 4 == 0);
var items3 = items.Where((item, index) => (index + 2) % 4 == 0);
var items4 = items.Where((item, index) => (index + 3) % 4 == 0);

var tasks = new Task[]
    {
       factory.StartNew(() => ProcessItems((items1))),
       factory.StartNew(() => ProcessItems((items2))),
       factory.StartNew(() => ProcessItems((items3))),
       factory.StartNew(() => ProcessItems((items4)))
    };

Task.WaitAll(tasks);

MSDN:

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks but i just really want to limit it
So just run 4 threads and no more, or what you are trying to limit?
The problem is I need to limit it to 4 threads, while being able to iterate the entire datatable, each row in the datatable needs to operate on it's on thread, which means at any given time i should be processing 4 rows
Well, 4 threads runs slower than one thread on one core, that is if they do not wait for data to process. I recommend that you let PLINQ or Threading.ThreadPool handle the number of threads and concentrate on the tasks that needs to be processed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.