I would like to run a logistic regression on each variable pair in my dataset, excluding the pairs that have already been regressed. All variables are binomial. The output should include the pair tested and test statistics. Since I have different datasets I have to use it on, I am trying to write a script that will work on different datasets containing various number of variables that are all binomial.
Sample dataset contains 6 variables named Var1:Var6 with 50 observations each.
Var1 = c(rbinom(50,1,0.5))
Var2 = c(rbinom(50,1,0.25))
Var3 = c(rbinom(50,1,0.6))
Var4 = c(rbinom(50,1,0.2))
Var5 = c(rbinom(50,1,0.3))
Var6 = c(rbinom(50,1,0.8))
dt = data.table(Var1, Var2, Var3, Var4, Var5, Var6)
head(dt)
Var1 Var2 Var3 Var4 Var5 Var6
1 1 0 1 1 0 1
2 1 0 0 0 0 1
3 0 0 1 0 0 1
4 1 0 1 0 1 0
5 1 0 1 1 0 1
6 0 1 1 1 0 0
So I would like to regress Var1 on Var2:Var6, Var2 on Var3:Var6 etc. The output table should contain Dependent_var, Independent_var, Estimate, Stat, P_value.
I've made an output table:
n = ncol(dt)
output <- data.table(matrix(nrow=(n*(n+1))/2, ncol=5))
names(output) = c("Dependent_var", "Independent_var", "Estimate", "Stat", "P_value")
head(output)
Dependent_var Independent_var Estimate Stat P_value
1: NA NA NA NA NA
2: NA NA NA NA NA
3: NA NA NA NA NA
4: NA NA NA NA NA
5: NA NA NA NA NA
6: NA NA NA NA NA
Now I am not sure how to loop one variable with all and then loop over all of them... AND then fill up the output table correctly... Any help is very much appreciated!