0

I'm currently trying to compute model estimators using the BLB bootstrap , and would like to do so parallel. my code works fine when I'm not doing it parallel. the problem when I'm computing in parallel is that the results I get from each core contains NA values. I don't understand how I get NA values while the Iris Data set's values don't contain NA at all. here is the code that I'm using :

library(doParallel)
library(itertools)

 num_of_cores <- detectCores()
 cl <- makePSOCKcluster(num_of_cores)
 registerDoParallel(cl)

 attach(iris)
 data <- iris
 coeftmp <- data.frame()
 system.time(
 r <- foreach(dat = isplitRows(data, chunks=num_of_cores),
             .combine = cbind) %dopar% {

                 BLBsize = round(nrow(dat)^0.6)
                 for (i in 1:400){
                         set.seed(i)

                         # sampling B(n) data points from the original data set without replacement
                         sample_BOFN <- dat[sample(nrow(dat), size = BLBsize, replace = FALSE), ]

                          # sampling from the subsample with replacment
                         sample_bootstrap <- sample_BOFN[sample(nrow(sample_BOFN), size = nrow(sample_BOFN), replace = TRUE), ]

                         bootstrapModel <- glm(sample_bootstrap$Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width, data = sample_bootstrap)
                         coeftmp <- rbind(coeftmp, bootstrapModel$coefficients)

                 }
                 #calculating the estimators of the model with mean
                  colMeans(coeftmp)

         })
6
  • Since I don't know how many cores you have, I'm not sure if this question will solve your problem. But it might: stackoverflow.com/questions/33221779 Commented Nov 14, 2015 at 16:33
  • Also, it's unclear to my why you sample w/o replacement for sample_BOFN if you're bootstrapping. But it also doesn't appear that you're using sample_BOFN, so you may wish to remove this from the (example) code. Commented Nov 14, 2015 at 16:34
  • I'm trying to implement BLB bootstrap which require sampling from subsamples w/o replacement. so that's why. Commented Nov 14, 2015 at 22:34
  • actually the reference to that link did not help, because I have 4 cores, and I'm splitting my data set with iterator into 4 chunks. I would like to train model on each core with the BLB bootstrap. I don't understand how it's possible that I get NA values ? (I'm running the code on MAC btw) Commented Nov 15, 2015 at 11:08
  • Do you get NAs if you only use 1 core? Commented Nov 15, 2015 at 17:12

1 Answer 1

1

I think you're going to have to go through a few iterations of the debugger on this to solve it. But you're getting NAsfrom this line

bootstrapModel <- glm(sample_bootstrap$Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width, data = sample_bootstrap)

I am guessing that you get a singularity from one of your sample_bootstraps, since a singularity would give you an NA coefficient. But it's possible something else is causing this error, though it's definitely from this line of code.... you'll need to step through the debugger to isolate it.

... ie, this is not a complete answer. But this should allow you to solve your own problem:

You can see this by investigating:

r2 <- foreach(dat = isplitRows(data, chunks=1)) %dopar% {

     BLBsize = round(nrow(dat)^0.6)
     for (i in 1:400){
       set.seed(i)

       # sampling B(n) data points from the original data set without replacement
       sample_BOFN <- dat[sample(nrow(dat), size = BLBsize, replace = FALSE), ]

       # sampling from the subsample with replacment
       sample_bootstrap <- sample_BOFN[sample(nrow(sample_BOFN), size = nrow(sample_BOFN), replace = TRUE), ]

       bootstrapModel <- glm(sample_bootstrap$Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width, data = sample_bootstrap)
       coeftmp <- rbind(coeftmp, bootstrapModel$coefficients)

     }
     #calculating the estimators of the model with mean
     # return a list, not just the colMeans -- for debugging purposes
     return(list(coeftmp= coeftmp, result= colMeans(coeftmp)))

   }

   sum(is.na(r2[[1]][[1]])) # no missing coefficients with 1 core

r <- foreach(dat = isplitRows(data, chunks=num_of_cores)) %dopar% {

     BLBsize = round(nrow(dat)^0.6)
     for (i in 1:400){
       set.seed(i)

       # sampling B(n) data points from the original data set without replacement
       sample_BOFN <- dat[sample(nrow(dat), size = BLBsize, replace = FALSE), ]

       # sampling from the subsample with replacment
       sample_bootstrap <- sample_BOFN[sample(nrow(sample_BOFN), size = nrow(sample_BOFN), replace = TRUE), ]

       bootstrapModel <- glm(sample_bootstrap$Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width, data = sample_bootstrap)
       coeftmp <- rbind(coeftmp, bootstrapModel$coefficients)

     }
     #calculating the estimators of the model with mean
     # return a list, not just the colMeans -- for debugging purposes
     return(list(coeftmp= coeftmp, result= colMeans(coeftmp)))

   }

 # lots of missing values in your coeftmp results.
 lapply(r, function(l) {sum(is.na(l[[1]]))}) 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.