0

I'm trying to repeatedly run a function that requires a few positional arguments and involves random number generation (to generate many samples of a distribution). For a MWE, I think this captures everything:

import numpy as np
import multiprocessing as mup
from functools import partial

def rarr(xsize, ysize, k):
    return np.random.rand(xsize, ysize)
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
    np.random.seed()
    if ncores is None:
        p = mup.Pool()
    else:
        p = mup.Pool(ncores)
    out = p.map_async( partial(rarr, xsize, ysize), range(nsamp))
    p.close()
    return np.array(out.get())

Note that the final positional argument for rarr() is just a dummy variable, since I am using map_async(), which requires an iterable. Now if I run %timeit clever_array(500, ncores = 1) I get 208 ms, whereas %timeit clever_array(500, ncores = 5) takes 149 ms. So there is definitely some kind of parallelism happening (the speedup isn't terribly impressive for this MWE but is decent in my real code).

However, I'm wondering a few things -- is there a more natural implementation other than the dummy variable for rarr() passed as an iterable to map_async to run this many times? Is there any obvious way to pass the xsize and ysize args to rarr() other than partial()? And is there any way to ensure different results from the different cores other than initializing a different random.seed() every time?

Thanks for any help!

1 Answer 1

1

Typically when we use multiprocessing we would expect different results from each invocation of a function, therefore it doesn't quite make sense to call the same function many times. In order to ensure the randomness of the sampling output, it is best to separate the random state (seed) from the function itself. The best approach as recommended by the numpy official doc is to use a np.random.Generator object, created via np.random.default_rng([seed]). With that we can modify your code to

import numpy as np
import multiprocessing as mup
from functools import partial

def rarr(xsize, ysize, rng):
    return rng.random((xsize, ysize))

def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
    if ncores is None:
        p = mup.Pool()
    else:
        p = mup.Pool(ncores)
    out = p.map_async(partial(rarr, xsize, ysize), map(np.random.default_rng, range(nsamp)))
    p.close()
    return np.array(out.get())
Sign up to request clarification or add additional context in comments.

3 Comments

I'm assuming that np.random.seed() should be removed as the first line of the definition of clever_array? if so, this does indeed seem to solve a different problem which I didn't know I had :)
@user1451632 You are right, I've updated the solution! If you find it helpful please consider upvote and/or accept. Thanks :)
great, thanks -- also should point out that using a different map() function as the iterable is a good idea I hadn't previously considered

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.