I'm doing a simulation that is repeating some independent calculation for number of samples. I want to make this parallel to speed this up. In each sample I'm generating some random numbers (using rnorm). I read (and seen) that doParallel numbers are not repeatable so I wanted to use doRNG (which in fact generates the same random numbers independently on number of cores. However what I was surprised with doRNG generates different numbers than sequential for-loop, even when I don't register parallel backend so calculation is done sequentially (when using %dorng% operator I get the same results as with parallel backend registered), however I get the same numbers with %dopar% with no parallel backend registered. Why is that? Can I somehow parametrize foreach/doRNG to get the same random numbers as in sequential for-loop? I wanted to use this as a check that I didn't mess anything up while moving to parallel.
Below is a simplified example (notice that I do not register parallel backend):
library(foreach)
library(doRNG)
library(doParallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
rn3 <- foreach(i=1:20, .combine = 'c') %dopar%{
return(rnorm(1,0,1))
}
rn1 <- foreach(i=1:20, .combine = 'c', .options.RNG=123) %dorng%{
return(rnorm(1,0,1))
}
set.seed(123)
rn2 <- foreach(i=1:20, .combine = 'c') %dorng%{
return(rnorm(1,0,1))
}
rn4 <- rep(0,20)
set.seed(123)
for(i in 1:20){
rn4[i] <- (rnorm(1,0,1))
}
identical(rn1, rn2)
identical(rn1, rn3)
identical(rn1, rn4)
identical(rn3, rn4)
It shows that rn1 and rn2 (two different methods of setting seed in dorng) are the same as well as rn3 and rn4 (doParallel and for loop), however rn1/rn2 and rn3/rn4 does not match with each other.
EDIT: I realized that there are different pseudo-random number generators employed. In %dorng% we use L'Ecuyer-CMRG while in base R the default is Mersenne-Twister. However when I set up it do L'Ecuyer-CMRG as well only first number matches. I've adjusted code to add setting up a differnt PRNG
lapply(3:1, rnorm), where the second 'task' (X == 2) has to 'know' that the first task generated 3 random numbers in order to set the seed for the task correctly. But if the second and first tasks are running in parallel, how can the second task know that?