1

I'm doing a simulation that is repeating some independent calculation for number of samples. I want to make this parallel to speed this up. In each sample I'm generating some random numbers (using rnorm). I read (and seen) that doParallel numbers are not repeatable so I wanted to use doRNG (which in fact generates the same random numbers independently on number of cores. However what I was surprised with doRNG generates different numbers than sequential for-loop, even when I don't register parallel backend so calculation is done sequentially (when using %dorng% operator I get the same results as with parallel backend registered), however I get the same numbers with %dopar% with no parallel backend registered. Why is that? Can I somehow parametrize foreach/doRNG to get the same random numbers as in sequential for-loop? I wanted to use this as a check that I didn't mess anything up while moving to parallel.

Below is a simplified example (notice that I do not register parallel backend):

library(foreach)
library(doRNG)
library(doParallel)
RNGkind("L'Ecuyer-CMRG")

set.seed(123)
rn3 <- foreach(i=1:20, .combine = 'c') %dopar%{ 
  return(rnorm(1,0,1))
}


rn1 <- foreach(i=1:20, .combine = 'c', .options.RNG=123) %dorng%{ 
  return(rnorm(1,0,1))
}

set.seed(123)
rn2 <- foreach(i=1:20, .combine = 'c') %dorng%{ 
  return(rnorm(1,0,1))
}


rn4 <- rep(0,20)
set.seed(123)
for(i in 1:20){
  rn4[i] <- (rnorm(1,0,1))
}

identical(rn1, rn2) 
identical(rn1, rn3)
identical(rn1, rn4)
identical(rn3, rn4)

It shows that rn1 and rn2 (two different methods of setting seed in dorng) are the same as well as rn3 and rn4 (doParallel and for loop), however rn1/rn2 and rn3/rn4 does not match with each other.

EDIT: I realized that there are different pseudo-random number generators employed. In %dorng% we use L'Ecuyer-CMRG while in base R the default is Mersenne-Twister. However when I set up it do L'Ecuyer-CMRG as well only first number matches. I've adjusted code to add setting up a differnt PRNG

2
  • Maybe it helps to think of something like lapply(3:1, rnorm), where the second 'task' (X == 2) has to 'know' that the first task generated 3 random numbers in order to set the seed for the task correctly. But if the second and first tasks are running in parallel, how can the second task know that? Commented Mar 3, 2022 at 15:53
  • That may be case when parallel backend is registered, however doRNG differs also when calculation is sequential, while doParallel give the same results in this case Commented Mar 4, 2022 at 8:40

1 Answer 1

0

Ok, I finally found the reason (the comment I made a moment ago help). What is %dorng% doing it is generating random seed for each value i in foreach. Due to that to get the same numbers as in %dorng% using for loop we need to first use L'Ecuyer-CMRG PRNG and we need to set the same number of seeds. In that case code that would replicate that in for-loop is:

RNGkind("L'Ecuyer-CMRG")

rn6 <- rep(0,20)
for(i in 1:20){
  .Random.seed <- attr(rn1,"rng")[[i]] #using seeds from rn1 from question
  rn6[i] <- (rnorm(1,0,1))
}

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.