1

I was experimenting with pythons multiprocessing and threading modules and, stumbled upon something rather strange.

have a look at the following program:

import threading
import time
import multiprocessing

target_count = int(100e6)

mutex = threading.Lock()

def thread_fn(thread_count):
    y = 0
    for i in range(target_count // thread_count):
        y += 1

x = 0
start = time.time()
for i in range(target_count):
    x += 1
end = time.time()

print(f"single thread time: {(end - start) * 1000}ms")

thread_count = 1
trds = []
start = time.time()
for i in range(thread_count):
    t = threading.Thread(target=thread_fn, args=(thread_count, ))
    t.start()
    trds.append(t)

for i in range(thread_count):
    trds[i].join()
end = time.time()

print(f"{thread_count} threads time: {(end - start) * 1000}ms")

Main thread increments variable until it reaches target_count, similarly opened single thread function thread_fn() increments a local variable.

I was expecting them to be around the same value but, the result I get from running this code is:

single thread time: 9375.770568847656ms
1 threads time: 5802.408456802368ms

Is this due to erroneous code or is something weird about GIL happening?

Also, if I set thread_count=8 change the thread_fn() program as follows (I know that using mutex here is non-sensical but I happen to have left it there accidentally):

def thread_fn(thread_count):
    y = 0
    with mutex:
        for i in range(target_count // thread_count):
            y += 1

running the program gives:

single thread time: 10064.738273620605ms
8 threads time: 6456.046581268311ms

Why with the addition of mutex i get a speed boost? I was expecting it to be slower due to unnecessary locking and unlocking since as far as i know because of GIL threads run sequentially.

and without using mutex and using 8 threads:

single thread time: 10910.805225372314ms
8 threads time: 13055.136442184448ms
1
  • IDK about Python, but a smart C or C++ compiler would optimize your thread_fn() by having it do nothing at all except immediately return. Commented Aug 16, 2024 at 16:41

1 Answer 1

1

The difference you are seeing with 1 worker - thread -

single thread time: 9375.770568847656ms
1 threads time: 5802.408456802368ms

has nothing to do with threads - a worker thread would not be "faster" than the main thread - however, using local variables inside a function, as compared to running your for loop at the module top level, where x is a global variable does: function local variable access is much more optmized than global variable usage, which always require a dictionary lookup in the namespace (whereas, in cPython, local variables inside a function use a single slot, with the index to access it embedded in the code object at compile time)

As for why the timing using the mutex is much faster than without it: you have some few bytecodes which depend on a local context - without the mutex, the code really doesn't run concurrently, but on the other hand, there are no context-switches on the Python side - execution frame and other state in the Python VM is fixed for the count of i to the end. You don't say which Python version you are using - but this context switching should take less overhead in Python 3.12 (and still less in upcoming versions) than it took up to version 3.11 - if you run with it with different interpreters there should be a noticeable difference.

Also, while this context switching adds to the overhead - entering the lock is done just once in each thread - it is 1 << 100_000_000 numbers: it had to be negligible.

If you change your lock to be acquired inside the for i loop, then you should get much more time running the same code. (still, when using mutexes and real-world code is the pattern we tend to write ):

def thread_fn(thread_count):
    y = 0
    for i in range(target_count // thread_count):
        with mutex:  
            y += 1

Still, if y is made a global variable all worker-threads would be adding too - this usage of the mutex is the only one which would ensure you had a correct result while running concurrent code (all the counts ocurring interleaving ina transparent way) .

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.