Numpy: random seed and multithreading causes differing results

Question

Tested on python 3.7, numpy 1.17.3:

it seems, that the random number generation with numpy when using a fixed seed and multithreading is not providing consistent results. This issue does not come up with scipy. The following snippet shows the problem:

import numpy as np
from scipy.stats import nbinom 

from concurrent.futures import ThreadPoolExecutor, as_completed


def load_data_np():
    np.random.seed(0)
    return np.random.negative_binomial(5, 0.3, size=2)
def load_data_scipy():
    return nbinom.rvs(5, 0.3, size=2, random_state=0)

These two methods should thus produce always the same numbers. But when producing the data in threaded loop...

with ThreadPoolExecutor() as executor:
   futures = list(
       (executor.submit(load_data_np)
        for i in range(1000))
   )
   print(np.diff([future.result() for future in as_completed(futures)]))

on can find such values among the output of numpy:

...
 [  4]
 [ -3]
 [-15]
 [ -3]
 [  5]
 [ -6]
 [  0]
 [  6]
 [  1]
 [-13]
 [ -7]
 [  3]
 [  6]
 [ -2]
 [ -1]
 [-11]
 [  3]
...

This must mean, that inbetween subsequent computations for the 2 samples (size=2) the random seed must have been reset by another thread, which throws the other threads off in their rng count. Just to compare this to scipy:

with ThreadPoolExecutor(max_workers=cpu_count()) as executor:
    futures = list(
        (executor.submit(load_data_scipy)
         for i in range(1000))
    )
    print(np.diff([future.result() for future in as_completed(futures)]))

yields the same values every iteration

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

So what is the proper way of thread-safe rng with a fixed seed in numpy? Googling the issue has lead me back to np.random.seed.

Cheers, Michael

fzn · Accepted Answer · 2019-11-15 12:20:16Z

1

I modified your load_data_np method to not use np.random.seed.

As I found in some other SO thread seed is known to not be thread-safe, and its recommended to use your own instances of RandomState.

def load_data_np():
    rs = np.random.RandomState(0)
    return rs.negative_binomial(5, 0.3, size=2)

And the output now looks as expected

...
 [-11]
 [-11]
 [-11]
 [-11]
 [-11]
...

This should help.

edited Nov 15, 2019 at 12:20

answered Nov 15, 2019 at 11:59

fzn

5223 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael A Over a year ago

Thanks, this is exactly the solution. Is there any known downside to using Randomstate overall instead of np.random.seed (e.g. performance, memory usage etc.) ? Otherwise it would seem to be better advise in general to rely on RandomState instead of random.seed.

Collectives™ on Stack Overflow

Numpy: random seed and multithreading causes differing results

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related