0

I was trying to parallelize the generation of normally-distributed random arrays with the numpy.random.normal() function but it looks like the calls from the threads are executed sequentially instead.

import numpy as np

start = time.time()

active_threads = 0
for i in range(100000):
    t = threading.Thread(target=lambda x : np.random.normal(0,2,4000), args = [0])
    t.start()

    while active_threads >= 12:
        time.sleep(0.1)
        continue

end = time.time()
print(str(end-start))

If i measure the time for a 1 thread process i get the same result as the 12 thread one. I know this kind of parallelization suffers from a lot of overhead, but even then it should still get some time off with the multithread version.

1 Answer 1

1

np.random.normal use a seed variable internally. This variable is retrieved from default_rng() and it is certainly shared between thread so it is not safe to call it using multiple threads (due to possible race conditions). In fact, the documentation provides examples for such a case (see here and there). Alternatively, you can use multiple processes (you need to configure the seed to get different results in different processes). Another solution is to use custom random number generators (RNG) so to use a different RNG object in each thread.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.