1

I want to ask for some help on writing a combine function for foreach(). Consider the function below:

library(mvtnorm)
library(doMC)

mySimFunc <- function(){
  myNum <- runif(1)
  myVec <- rnorm(10)
  myMat <- rmvnorm(5, rep(0, 3), diag(3))
  myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}

Now I'd like to run the code above for 1000 times using foreach() %dopar% and in each iteration I'd like to:

  1. return myNum as is
  2. get average of myVec and return it
  3. get colMeans() of myMat and return it.

I'd like foreach() %dopar% to return a final list including:

  1. a vector of length 1000 including 1000 myNum each corresponding to an iteration
  2. a vector of length 1000 including 1000 average of myVec in each iteration
  3. a matrix with 1000 rows where each row includes colMeans of myMat in that iteration

My Ideal solution

My ideal solution is o find a way that foreach() acts exactly like for so that I can simply define:

myNumRslt <- NULL
myVecRslt <- NULL
myMatRslt <- NULL

# and then simply aggregate result of each iteration to the variables above as:
foreach(i = 1:1000) %dopar%{
   rslt <- mySimFunc()
   myNumRslt <- c(myNumRslt, rslt$myNum)
   myVecRslt <- c(myVecRslt, mean(rslt$myVec))
   myMatRslt.tmp <- colMeans(rslt$myMat)
   myMatRslt <- rbind(myMatRslt, myMatRslt.tmp)
}

BUT, unfortunately seems that it's not possible to do that with foreach() so then I think the only solution is to write a combine function that does similar to result aggregation above.

Challenge

1) How could I write a combine function that returns what I explained above?

2) When we do %dopar% (suppose using doMC package), does doMC distribute each iteration to a CPU or it goes further and divide each iteration to further pieces and distribute them?

3) Is there any better (more efficient) way than using doMC and foreach() ? idea's In this question Brian mentioned a brilliant way to deal with lists including numeric values. In my case, I have numeric values as well as vectors and matrices. I don't know how to extend Brian's idea in my case.

Thanks very much for your help.

3
  • Do you have to use .combine? If so, why? Commented May 5, 2014 at 9:52
  • Hi @BenRollert, I added some clarification to the question. Please read the "ideal solution" in my question (just added to the question). As I explained, my ideal solution is to find a way that foreach() behaves exactly like for so that I would be able to do "result" aggregation in each iteration using "c" and rbind(). So the answer is no; I don't have to use .combine. Commented May 5, 2014 at 15:33
  • From what I understand, .combine will operate on the list output, i.e. rbind in .combine is the same thing as rbinding the list output. It is entirely possible there is some performance benefit I'm unaware of for edge cases, which is why I'm very interested to see other people's responses. Commented May 5, 2014 at 15:39

1 Answer 1

1

Edit

Cleaned up, generalizable solution using .combine:

#modify function to include aggregation
mySimFunc2 <- function(){
myNum <- runif(1)
myVec <- mean(rnorm(10))
myMat <- colMeans(rmvnorm(5, rep(0, 3), diag(3)))
myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}

#.combine function
MyComb1 <- function(...) {
lst=list(...)
vec<-sapply(1:length(lst), function (i) return(lst[[i]][[1]] ))
vecavg<-sapply(1:length(lst),function (i) return(lst[[i]][[2]] ))
colmeans<-t(sapply(1:length(lst), function (i) return(lst[[i]][[3]])))
final<-list(vec,vecavg,colmeans)
names(final)<-c("vec","vecavg","colmeans")
return(final)
}

library(doParallel)
cl <- makeCluster(3) #set cores
registerDoParallel(cl)

foreach(i=1:1000,.export=c("mySimFunc2","MyComb1"),.combine=MyComb1,
.multicombine=TRUE,.maxcombine=1000, .packages=c("mvtnorm"))%dopar%{mySimFunc2()}

You should now have a list output containing the desired three objects, which I've titled respectively as vec, vecavg, and colmeans. Note you must set .maxcombine to the number of iterations if iterations are greater than 100.

As a side note, it does not make sense to parallelize for this example task, although I'm guessing the real task may be more complex.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi @BenRollert, thanks for the answer. While your answer is definitely a solution for my question (+1), but unfortunately, your answer is not a practical solution for my original code in which I have a function generating 14 different results in each iteration including matrices. If I run my foreach () and store all inputs in a list, then I probably will get stackoverflow issue.
Ben, now that I'm thinking more I can see that your solution can totally fix my issue. I'll wait for couple of days to see whether some comes up with a more efficient way with less overhead time. If not, your solution is definitely accepted. Thanks very much for your help.
You're still limited to the RAM on your machine either way. And if you aggregate within your function (as in my second solution), the list output should be the same size as your final solution.
Ok, cleaned up the solution using .combine and made it generalize to n iterations

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.