taking many random samples in parallel python

Question

I'm trying to repeatedly run a function that requires a few positional arguments and involves random number generation (to generate many samples of a distribution). For a MWE, I think this captures everything:

import numpy as np
import multiprocessing as mup
from functools import partial

def rarr(xsize, ysize, k):
    return np.random.rand(xsize, ysize)
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
    np.random.seed()
    if ncores is None:
        p = mup.Pool()
    else:
        p = mup.Pool(ncores)
    out = p.map_async( partial(rarr, xsize, ysize), range(nsamp))
    p.close()
    return np.array(out.get())

Note that the final positional argument for rarr() is just a dummy variable, since I am using map_async(), which requires an iterable. Now if I run %timeit clever_array(500, ncores = 1) I get 208 ms, whereas %timeit clever_array(500, ncores = 5) takes 149 ms. So there is definitely some kind of parallelism happening (the speedup isn't terribly impressive for this MWE but is decent in my real code).

However, I'm wondering a few things -- is there a more natural implementation other than the dummy variable for rarr() passed as an iterable to map_async to run this many times? Is there any obvious way to pass the xsize and ysize args to rarr() other than partial()? And is there any way to ensure different results from the different cores other than initializing a different random.seed() every time?

Thanks for any help!

cicolus · Accepted Answer · 2020-05-28 04:41:11Z

1

Typically when we use multiprocessing we would expect different results from each invocation of a function, therefore it doesn't quite make sense to call the same function many times. In order to ensure the randomness of the sampling output, it is best to separate the random state (seed) from the function itself. The best approach as recommended by the numpy official doc is to use a np.random.Generator object, created via np.random.default_rng([seed]). With that we can modify your code to

import numpy as np
import multiprocessing as mup
from functools import partial

def rarr(xsize, ysize, rng):
    return rng.random((xsize, ysize))

def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
    if ncores is None:
        p = mup.Pool()
    else:
        p = mup.Pool(ncores)
    out = p.map_async(partial(rarr, xsize, ysize), map(np.random.default_rng, range(nsamp)))
    p.close()
    return np.array(out.get())

edited May 28, 2020 at 4:41

answered May 27, 2020 at 16:53

cicolus

8579 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1451632 Over a year ago

I'm assuming that np.random.seed() should be removed as the first line of the definition of clever_array? if so, this does indeed seem to solve a different problem which I didn't know I had :)

cicolus Over a year ago

@user1451632 You are right, I've updated the solution! If you find it helpful please consider upvote and/or accept. Thanks :)

user1451632 Over a year ago

great, thanks -- also should point out that using a different map() function as the iterable is a good idea I hadn't previously considered

Collectives™ on Stack Overflow

taking many random samples in parallel python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related