numpy.random.normal() in multithreaded environment

Question

I was trying to parallelize the generation of normally-distributed random arrays with the numpy.random.normal() function but it looks like the calls from the threads are executed sequentially instead.

import numpy as np

start = time.time()

active_threads = 0
for i in range(100000):
    t = threading.Thread(target=lambda x : np.random.normal(0,2,4000), args = [0])
    t.start()

    while active_threads >= 12:
        time.sleep(0.1)
        continue

end = time.time()
print(str(end-start))

If i measure the time for a 1 thread process i get the same result as the 12 thread one. I know this kind of parallelization suffers from a lot of overhead, but even then it should still get some time off with the multithread version.

Jérôme Richard · Accepted Answer · 2022-02-11 00:13:24Z

1

np.random.normal use a seed variable internally. This variable is retrieved from default_rng() and it is certainly shared between thread so it is not safe to call it using multiple threads (due to possible race conditions). In fact, the documentation provides examples for such a case (see here and there). Alternatively, you can use multiple processes (you need to configure the seed to get different results in different processes). Another solution is to use custom random number generators (RNG) so to use a different RNG object in each thread.

answered Feb 11, 2022 at 0:13

Jérôme Richard

53.4k6 gold badges48 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

numpy.random.normal() in multithreaded environment

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related