0

I have a problem that I just can't solve on my own. There is a Python script that looks something like:

from threading import Thread
import threading
import pymssql
lock = threading.Lock()
global_list = []
max_workers = 1
workers_online =0

def get_query(my_query,var1,list1..):
    global workers_online,global_list   
    connection = pymssql.connect(params)
    cursor = connection.cursor()    
    cursor.execute(my_query)
        records = cursor.fetchall()
        cursor.close()
    #do something with records
    with lock:
        workers_online -= 1
        global_list.apend('some data')
main():
    while workers_online >=max_workers:
                pass
    workers_online +=1
    for query in main_list:
        th = Thread(target=get_query, args=(query,var1,list1.. ),daemon = False)
        th.start()
        #th.join()

The script sends queries to the MSSQL database using the "pymssql" library in a for-loop cycle. Each query returns approximately 10 to 1000 rows, but there are also large ones - about 100k rows. On the server side, everything works quickly, 1-10 seconds per request.

The script runs from the local machine: Windows 10 Pro x64,i5-11300H, 16 GB ram

But here's the problem.
If I don’t use threading at all(no threading library in script) or use 1 thread with th.join(commented in code), then everything works quickly.
As soon as I use more than one thread and stop using the join, that part: records = cursor.fetchall() - begins to take a very long time to execute on queries that return a ~10k of rows or more. If i reduce the query results with 'SELECT TOP 1000' On the MSSQL server side, the request hangs with the status of execution = 'ASYNC_NETWORK_IO' and no memory consumption.
Seems like Insufficient recources from my, not the server side.
Of course i can uncomment #th.join() , but then there won't be any difference from script without threading at all.

I'm trying to understand why this happens. Perhaps someone has already encountered this. Is it possible to modify the existing script or use a different approach to implement this idea?

7
  • Is this Python on Linux? pymssql on Linux is backed by FreeTDS and unixODBC, see if Threading in unixODBC makes a difference. Do you get better performance if you use pyodbc and either ODBC Driver 17 for SQL Server or ODBC Driver 18 for SQL Server? Commented Nov 21, 2023 at 6:11
  • @AlwaysLearning Hello! I run it on Windows 10 Pro x64, i5-11300H, 16 GB ram. No, i didn't tried pyodbc yet. It can help? Commented Nov 21, 2023 at 6:53
  • Race your horses and see? Commented Nov 21, 2023 at 7:15
  • @AlwaysLearning I checked pyodbc with ODBC Driver 17 for SQL Server - the same story - it freezes on data sets with a large number of rows. With ODBC Driver 18, the connection is not established. Commented Nov 21, 2023 at 10:02
  • Uhh, i haven't seen threads in python for a while, why not use multiprocessing, it's much less awkward. Also, it's wrong to JOIN thread in the same loop as where you create them. you should put all your threads into an array and then join them after you got them started otherwise this will behave like a unthreaded application. Commented Nov 21, 2023 at 10:53

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.