Multithreading on multiple datatables

Question

TL;DR: Why is it that a single-threaded application runs this code in 80077ms, whereas multithreaded, it takes over 3 times as long?

Bit of a long-winded question and possibly could result in a much simpler resolution than what I've found.

I'm trying to separate one datatable into x amount of datatables, based on a variable that is hard-coded. That's not where the issue arises, although if anyone has a cleaner solution than I have, I'd be very appreciative of help in that aspect.

My issue arises from the fact that, even though I generate x amount of BackgroundWorker, I am still getting results that are showing that it's not advantageous to break the main table into multiple tables.

The idea behind this is simple - we have an application that can run only a certain amount of concurrent connections, let's say 10 simultaneous connections. I want to be able to get the initial datatable of let's say 150,000 rows, and I know that for 10 connections I may as well make 10 datatables of 15,000 rows each, then process each table individually, rather than pushing through 150,000 with one datatable, all under one connection.

So far, this is what I've come up with

Private Sub CheckJobcodesPendingUpdate()
    Jobcode_AlreadyTried = New List(Of Integer)
    Dim sw = Stopwatch.StartNew()
    RTB.AppendText("I'm starting..." & vbCrLf)

    Dim Jobcodes As DataTable = SQL.SQLdataTable("SELECT [Jobcode] FROM [database].[schema].[Jobcodes]")
    sw.Stop
    RTB.AppendText("Took " & sw.ElapsedMilliseconds & "ms to retrieve " & Jobcodes.Rows.Count & " rows." & vbCrLf)

    Application.DoEvents

    sw = Stopwatch.StartNew()

    Dim ds As New DataSet
    Dim dt As Datatable

    Dim tableSeperator as Integer = Jobcodes.Rows.Count / 10 'The amount of connections we can have simultaneously.
    Dim tableCount As Integer = 0
    tableCount = Math.Ceiling(JobcodesEPC10.Rows.Count / tableSeperator)

    Do Until tableCount = 0
        dt = (From t In Jobcodes Order By t.Item("Jobcode") Ascending Select t).Take(tableSeperator).CopyToDataTable
        For each row As DataRow In dt.Rows
            Jobcodes.Rows.Remove(Jobcodes.AsEnumerable.First(Function(r) r.Item("Jobcode") = row.Item("Jobcode")))
        Next
        ds.Tables.Add(dt)
        tableCount -= 1
    Loop

    sw.Stop

    RTB.AppendText(vbCrLf & "Took " & sw.ElapsedMilliseconds & "ms to create all " & ds.Tables.Count & " tables.")

    For each table As DataTable In ds.Tables
        Dim WorkerJobcodes As New BackgroundWorker
        AddHandler WorkerJobcodes.DoWork, AddressOf Async_Project
        AddHandler WorkerJobcodes.RunWorkerCompleted, AddressOf WorkCompleted
        WorkerJobcodes.RunWorkerAsync(table)
    Next

End Sub

I'm not a fan of dumping a code-block and asking 'solve this'. This is the main method that is called, and the BackgroundWorker simply processes each of the rows into the system.

This all works, but when I timed it using the 10 separate BackgroundWorker's, then it took 262,597ms; whereas on a single, main thread it took 80,007ms.

Am I misunderstanding the concept of the BackgroundWorker, hence my performance hit? Or am I using the wrong tool/incorrectly for the job?

Thanks in advance.

Pushing work like this to threads is useful when you are performing CPU-bound operations, but when you're doing IO then you create resource conflicts causing the whole thing to slow down. Your hard drive is simultaneously trying to access different parts of the drive when you use multiple threads. When you only use one it is free to read the data sequentially thus avoiding all of the seek delays. — Enigmativity
– Enigmativity, Commented Nov 24, 2015 at 3:04
Oh, and please never, ever, ever use Application.DoEvents. It's bad voo-doo for VB6 compatibility. — Enigmativity
– Enigmativity, Commented Nov 24, 2015 at 3:08
+1 to your comments. Haha, I never do use Application.DoEvents, it was solely to see the text appear from a locked up thread in realtime. Production code doesn't have these in there :) — DeeKayy90
– DeeKayy90, Commented Nov 24, 2015 at 3:27
If you can put your first comment as the answer, I'll accept it as it's a very clear definition of why my code was taking longer in a multi-threaded system rather than single threaded. Thank you! — DeeKayy90
– DeeKayy90, Commented Nov 24, 2015 at 3:28

Enigmativity · Accepted Answer · 2015-11-24 03:32:56Z

3

Pushing work like this to threads is useful when you are performing CPU-bound operations, but when you're doing IO then you create resource conflicts causing the whole thing to slow down. Your hard drive is simultaneously trying to access different parts of the drive when you use multiple threads. When you only use one it is free to read the data sequentially thus avoiding all of the seek delays.

answered Nov 24, 2015 at 3:32

Enigmativity

117k12 gold badges101 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DeeKayy90 Over a year ago

Thank you for this Enigmativity, I was sure I have yet to fully understand the usability of BackgroundWorkers, and threading in general. +1.

Collectives™ on Stack Overflow

Multithreading on multiple datatables

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related