1

Tested on python 3.7, numpy 1.17.3:

it seems, that the random number generation with numpy when using a fixed seed and multithreading is not providing consistent results. This issue does not come up with scipy. The following snippet shows the problem:

import numpy as np
from scipy.stats import nbinom 

from concurrent.futures import ThreadPoolExecutor, as_completed


def load_data_np():
    np.random.seed(0)
    return np.random.negative_binomial(5, 0.3, size=2)
def load_data_scipy():
    return nbinom.rvs(5, 0.3, size=2, random_state=0)

These two methods should thus produce always the same numbers. But when producing the data in threaded loop...

with ThreadPoolExecutor() as executor:
   futures = list(
       (executor.submit(load_data_np)
        for i in range(1000))
   )
   print(np.diff([future.result() for future in as_completed(futures)]))

on can find such values among the output of numpy:

...
 [  4]
 [ -3]
 [-15]
 [ -3]
 [  5]
 [ -6]
 [  0]
 [  6]
 [  1]
 [-13]
 [ -7]
 [  3]
 [  6]
 [ -2]
 [ -1]
 [-11]
 [  3]
...

This must mean, that inbetween subsequent computations for the 2 samples (size=2) the random seed must have been reset by another thread, which throws the other threads off in their rng count. Just to compare this to scipy:

with ThreadPoolExecutor(max_workers=cpu_count()) as executor:
    futures = list(
        (executor.submit(load_data_scipy)
         for i in range(1000))
    )
    print(np.diff([future.result() for future in as_completed(futures)]))

yields the same values every iteration

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

So what is the proper way of thread-safe rng with a fixed seed in numpy? Googling the issue has lead me back to np.random.seed.

Cheers, Michael

1 Answer 1

1

I modified your load_data_np method to not use np.random.seed.

As I found in some other SO thread seed is known to not be thread-safe, and its recommended to use your own instances of RandomState.

def load_data_np():
    rs = np.random.RandomState(0)
    return rs.negative_binomial(5, 0.3, size=2)

And the output now looks as expected

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

This should help.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is exactly the solution. Is there any known downside to using Randomstate overall instead of np.random.seed (e.g. performance, memory usage etc.) ? Otherwise it would seem to be better advise in general to rely on RandomState instead of random.seed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.