4

Consider the very basic (and inefficient) code using parallel foreach for generating random values:

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:100) %dopar% rnorm(1)

Is it correct or are there any additional steps needed for random generation to work properly? I guess it's enough and fast checks seem to "prove" that seeds work properly, but I'd like to be sure that it is so on other platforms, since I want the code to be portable.

2
  • 1
    This collapes into two important subtasks: A: making sure parallel calls to some PRNG are working (thread-safety, blocking and co.) where the more safe approach is using one PRNG for each thread/process (not sure what kind of parallelization is done here) and B: (in the case of different PRNGs) making sure that those seeds are able to produce good random-numbers. There are a lot of defects in many PRNGs in regards to this (e.g. Mersenne-Twister initialized with seeds: 0, 1, 2 -> bad). The keyword for further search is: distributed seeding (with many approaches: leap-frogging; PRNG-jumps, .). Commented Apr 9, 2017 at 2:12
  • Thanks. But all-purpose state-of-art packages like plyr do not seem to care about it. Does it mean that they should not be used for such purpose? Commented Apr 9, 2017 at 6:54

1 Answer 1

10

Your worries are correct; random number generation does not magically work in parallel and further steps need to be taken. When using the foreach framework, you can use the doRNG extension to make sure to get sound random numbers also when done in parallel.

Example:

library("doParallel")
cl <- makeCluster(2)
registerDoParallel(cl)

## Declare that parallel RNG should be used for in a parallel foreach() call.
## %dorng% will still result in parallel processing; it uses %dopar% internally.
library("doRNG")

y <- foreach(i = 1:100) %dorng% rnorm(1)

EDIT 2020-08-04: Previously this answer proposed the alternative:

library("doRNG")
registerDoRNG()
y <- foreach(i = 1:100) %dopar% rnorm(1)

However, the downside for that is that it is more complicated for the developer to use registerDoRNG() in a clean way inside functions. Because of this, I recommend to use %dorng% to specify that parallel RNG should be used.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, +1. I heard about doRNG but never used it. Could you comment on how does using registerDoRNG() differ from using %dorng% ..? Are they the same?
I'm rusty on the details, but the doRNG vignette should answer your question.
By the way, does doFuture handle the RNG seeds anyhow?
No, doFuture is just a thin layer wrapping future into the foreach framework - it passes RNG needs on to foreach / doRNG.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.