0

I am running regressions on a data set with > 4 million observations. I am trying to do a sensitivity analysis to see how sensitive my regression coefficients are to different sample sizes. I want to:

  • Take a random sample of n data points
  • Run a regression on the sample
  • Extract the coefficient

I want to replicate the above steps 400 times to create a distribution of coefficients and plot the mean and confidence intervals. I have created two functions to do this. The first, coef_sampler, runs a regression on a sample of the data and returns the coefficient:

coef_sampler <- function(data,n){
      model <- lm(y ~ x, data = data[sample(nrow(data), n),])
      return(model[["coefficients"]][[1]])
  }

The second, my_function, replicates this process 400 times for different sample sizes, generating vectors of coefficients for different sample sizes. It then calculates the mean of each vector along with it's upper and lower bounds using the CI() function from Rmisc:

my_function <- function(data, n, nrep, to, by){

  for (j in seq.int(n, to, by)){
    plist <- data.frame(pval = replicate(nrep, coef_sampler(data,j)))

    if(!exists("ci_mat")) {
      ci_mat <- data.frame(CI(plist[,1]))
    } else {
      ci_mat <- cbind(ci_mat, data.frame(CI(plist[,1])))
    }
  }
}

When I just run the for loop with the values I want for number of repetitions, and a sequence of sample sizes, it works fine. When I call my_function() with the above values, it simply doesn't work. No data frame of means and confidence intervals is generated. Note that I call the coef_sampler() function within my_function, could this be causing the problem? Or is it using a for loop that uses the function arguments causing the problem.

Hopefully my question makes sense, any and all help would be greatly appreciated!

3
  • Where does the CIfunction comes from? Commented Nov 17, 2019 at 21:58
  • CI is from Rmisc. It just calculates confidence intervals and returns the upper bound, mean, and lower bound Commented Nov 17, 2019 at 22:00
  • Just a general reminder: for a reproducible minimal example, please include all needed packages along with some sample data, e.g. with dput() Commented Nov 17, 2019 at 22:01

2 Answers 2

2

Just include a return at the end of your function

library(Rmisc)

my_data <- tibble(x = 1:5,
                  y = rnorm(5))

coef_sampler <- function(data,n){
  model <- lm(y ~ x, data = data[sample(nrow(data),n),])
  return(model[["coefficients"]][[1]])
}

my_function <- function(data,n,nrep,to,by){
  for (j in seq.int(n,to,by)){
    plist <- data.frame(pval = replicate(nrep,coef_sampler(data,j)))

    if(!exists("ci_mat")){
      ci_mat <- data.frame(CI(plist[,1]))
    } else {
      ci_mat <- cbind(ci_mat, data.frame(CI(plist[,1])))
    }
  }
  return(ci_mat)
}

my_function(my_data, 1, 2, 5, 1)
Sign up to request clarification or add additional context in comments.

1 Comment

Will upvote this as well because you basically posted at the exact same time. Thank you!
1

The problem is that your function does not return the populated ci_mat. This variables only lives in the function environment and gets destroyed after the function run:

df <- data.frame(x = 1:100, y = 4*(1:100) + 5 + rnorm(100,5,10))

coef_sampler <- function(data,n){
  model <- lm(y ~ x, data = data[sample(nrow(data),n),])
  return(model[["coefficients"]][[1]])
}

my_function <- function(data,n,nrep,to,by){

  for (j in seq.int(n,to,by)){
    plist <- data.frame(pval = replicate(nrep,coef_sampler(data,j)))

    if(!exists("ci_mat")){
      ci_mat <- data.frame(Rmisc::CI(plist[,1]))
    }else{
      ci_mat <- cbind(ci_mat, data.frame(Rmisc::CI(plist[,1])))
    }
  }
  ci_mat
}    

# sample call
my_function(df, 5, 10, 100,1)

Make your function return ci_mat.

1 Comment

Wow. I probably should have seen that. Works great now, I love stackoverflow! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.