doRNG and sequential random numbers differ with same seed - R (foreach, doParallel, doRNG)

Question

I'm doing a simulation that is repeating some independent calculation for number of samples. I want to make this parallel to speed this up. In each sample I'm generating some random numbers (using rnorm). I read (and seen) that doParallel numbers are not repeatable so I wanted to use doRNG (which in fact generates the same random numbers independently on number of cores. However what I was surprised with doRNG generates different numbers than sequential for-loop, even when I don't register parallel backend so calculation is done sequentially (when using %dorng% operator I get the same results as with parallel backend registered), however I get the same numbers with %dopar% with no parallel backend registered. Why is that? Can I somehow parametrize foreach/doRNG to get the same random numbers as in sequential for-loop? I wanted to use this as a check that I didn't mess anything up while moving to parallel.

Below is a simplified example (notice that I do not register parallel backend):

library(foreach)
library(doRNG)
library(doParallel)
RNGkind("L'Ecuyer-CMRG")

set.seed(123)
rn3 <- foreach(i=1:20, .combine = 'c') %dopar%{ 
  return(rnorm(1,0,1))
}


rn1 <- foreach(i=1:20, .combine = 'c', .options.RNG=123) %dorng%{ 
  return(rnorm(1,0,1))
}

set.seed(123)
rn2 <- foreach(i=1:20, .combine = 'c') %dorng%{ 
  return(rnorm(1,0,1))
}


rn4 <- rep(0,20)
set.seed(123)
for(i in 1:20){
  rn4[i] <- (rnorm(1,0,1))
}

identical(rn1, rn2) 
identical(rn1, rn3)
identical(rn1, rn4)
identical(rn3, rn4)

It shows that rn1 and rn2 (two different methods of setting seed in dorng) are the same as well as rn3 and rn4 (doParallel and for loop), however rn1/rn2 and rn3/rn4 does not match with each other.

EDIT: I realized that there are different pseudo-random number generators employed. In %dorng% we use L'Ecuyer-CMRG while in base R the default is Mersenne-Twister. However when I set up it do L'Ecuyer-CMRG as well only first number matches. I've adjusted code to add setting up a differnt PRNG

Maybe it helps to think of something like lapply(3:1, rnorm), where the second 'task' (X == 2) has to 'know' that the first task generated 3 random numbers in order to set the seed for the task correctly. But if the second and first tasks are running in parallel, how can the second task know that? — Martin Morgan
– Martin Morgan, Commented Mar 3, 2022 at 15:53
That may be case when parallel backend is registered, however doRNG differs also when calculation is sequential, while doParallel give the same results in this case — user7473630
– user7473630, Commented Mar 4, 2022 at 8:40

user7473630 · Accepted Answer · 2022-03-04 16:22:52Z

0

Ok, I finally found the reason (the comment I made a moment ago help). What is %dorng% doing it is generating random seed for each value i in foreach. Due to that to get the same numbers as in %dorng% using for loop we need to first use L'Ecuyer-CMRG PRNG and we need to set the same number of seeds. In that case code that would replicate that in for-loop is:

RNGkind("L'Ecuyer-CMRG")

rn6 <- rep(0,20)
for(i in 1:20){
  .Random.seed <- attr(rn1,"rng")[[i]] #using seeds from rn1 from question
  rn6[i] <- (rnorm(1,0,1))
}

answered Mar 4, 2022 at 16:22

user7473630

619 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

doRNG and sequential random numbers differ with same seed - R (foreach, doParallel, doRNG)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related