2

I would like to run a logistic regression on each variable pair in my dataset, excluding the pairs that have already been regressed. All variables are binomial. The output should include the pair tested and test statistics. Since I have different datasets I have to use it on, I am trying to write a script that will work on different datasets containing various number of variables that are all binomial.

Sample dataset contains 6 variables named Var1:Var6 with 50 observations each.

Var1 = c(rbinom(50,1,0.5))
Var2 = c(rbinom(50,1,0.25))
Var3 = c(rbinom(50,1,0.6))
Var4 = c(rbinom(50,1,0.2))
Var5 = c(rbinom(50,1,0.3))
Var6 = c(rbinom(50,1,0.8))

dt = data.table(Var1, Var2, Var3, Var4, Var5, Var6)  
head(dt)

  Var1 Var2 Var3 Var4 Var5 Var6
1    1    0    1    1    0    1
2    1    0    0    0    0    1
3    0    0    1    0    0    1
4    1    0    1    0    1    0
5    1    0    1    1    0    1
6    0    1    1    1    0    0

So I would like to regress Var1 on Var2:Var6, Var2 on Var3:Var6 etc. The output table should contain Dependent_var, Independent_var, Estimate, Stat, P_value.

I've made an output table:

n = ncol(dt)    
output <- data.table(matrix(nrow=(n*(n+1))/2, ncol=5))
names(output) = c("Dependent_var", "Independent_var", "Estimate", "Stat", "P_value")
head(output)

 Dependent_var Independent_var Estimate Stat P_value
1:            NA              NA       NA          NA      NA
2:            NA              NA       NA          NA      NA
3:            NA              NA       NA          NA      NA
4:            NA              NA       NA          NA      NA
5:            NA              NA       NA          NA      NA
6:            NA              NA       NA          NA      NA

Now I am not sure how to loop one variable with all and then loop over all of them... AND then fill up the output table correctly... Any help is very much appreciated!

1 Answer 1

2

I have sketched out a simple procedure, see if it helps:

my_func <- function(x) {
  for (i in 1:ncol(x)) {
    fit <- glm(x[, i] ~ ., data = x, family = binomial(link = "logit"))
    print(summary(fit))
  }
}

The parameter x is a dataframe.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your quick answer @jsb!
I seem to be doing something wrong though... So I turned my dt into a dataframe and then tried to run the loop. Am not getting any data though, how can I tickle it out? p.s. sorry, I can't seem to get the code properly formatted in the comments section. dt <- data.frame(dt) my_func <- function(dt) { for (i in 1:ncol(dt)) { fit <- glm(dt[, i] ~ ., data = dt, family = binomial(link = "logit")) print(summary(fit)) } }

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.