1

I want to do logistic regressions for several (n = 30) SNPs (coded as 0,1,2) as predictors and a casecontrol variable (0,1) as an outcome. As few of those rs are correlated, I cannot put all rs# in one model but have to run one at a time regression for each i.e., I cannot simply plus them together in one model like rs1 + rs2 + rs3 and so on....I need to have each regressed separately like below;

test1 = glm(casecontrol ~ rs1, data = mydata, family=binomial)

test2 = glm(casecontrol ~ rs2, data = mydata, family=binomial)

test3 = glm(casecontrol ~ rs3, data = mydata, family=binomial)

While I can run all the above regressions separately, is there a way to loop them together so I could get a summary() of all tests in one go?

I will have to adjust for age and sex too, but that would come after I run an unadjusted loop.

My data head from dput(head(mydata)) for example;

structure(list(ID = 1:6, sex = c(2L, 1L, 2L, 1L, 1L, 1L), age = c(52.4725405022036, 
58.4303618001286, 44.5300065923948, 61.4786037395243, 67.851808819687, 
39.7451378498226), bmi = c(31.4068751083687, 32.0614937413484, 
23.205021363683, 29.1445372393355, 32.6287483051419, 20.5887741968036
), casecontrol = c(0L, 1L, 0L, 1L, 1L, 1L), rs1 = c(1L, 0L, 2L, 
2L, 1L, 2L), rs2 = c(2L, 1L, 2L, 0L, 1L, 1L), rs3 = c(1L, 0L, 
1L, 2L, 2L, 2L), rs4 = c(1L, 1L, 1L, 1L, 0L, 2L), rs5 = c(1L, 
0L, 0L, 0L, 1L, 2L), rs6 = c(1L, 1L, 1L, 1L, 1L, 2L), rs7 = c(0L, 
0L, 0L, 0L, 0L, 0L), rs8 = c(0L, 0L, 1L, 0L, 2L, 1L), rs9 = c(0L, 
0L, 2L, 1L, 1L, 0L), rs10 = c(2L, 0L, 0L, 2L, 2L, 1L), rs11 = c(0L, 
1L, 1L, 0L, 1L, 1L), rs12 = c(1L, 2L, 0L, 1L, 2L, 2L), rs13 = c(0L, 
2L, 0L, 0L, 0L, 0L), rs14 = c(1L, 1L, 1L, 1L, 2L, 2L), rs15 = c(1L, 
2L, 1L, 1L, 0L, 1L), rs16 = c(0L, 2L, 1L, 2L, 2L, 1L), rs17 = c(0L, 
2L, 1L, 1L, 2L, 2L), rs18 = c(1L, 2L, 2L, 1L, 1L, 1L), rs19 = c(1L, 
1L, 0L, 1L, 2L, 2L), rs20 = c(2L, 1L, 0L, 2L, 2L, 1L), rs21 = c(1L, 
2L, 2L, 1L, 1L, 0L), rs22 = c(1L, 1L, 2L, 2L, 0L, 1L), rs23 = c(2L, 
0L, 2L, 1L, 1L, 1L), rs24 = c(0L, 0L, 0L, 2L, 2L, 2L), rs25 = c(2L, 
2L, 1L, 1L, 0L, 1L), rs26 = c(1L, 1L, 0L, 2L, 0L, 1L), rs27 = c(1L, 
1L, 1L, 1L, 0L, 1L), rs28 = c(0L, 1L, 1L, 2L, 0L, 2L), rs29 = c(2L, 
2L, 2L, 2L, 1L, 2L), rs30 = c(0L, 2L, 1L, 2L, 1L, 0L)), row.names = c(NA, 
6L), class = "data.frame")```
2
  • there are lots of answered questions of this general type on SO already - most are about linear regression but the general method is the same, something like this: stackoverflow.com/questions/28373227/r-regressions-in-a-loop Commented Jun 25, 2020 at 11:06
  • You can try the logistic.display function from the epiDisplay package. You can add all predictors in one model and the output shows crude ORs, CIs and p-values (together with the adjusted results) and p-values from the LR test. The function does the looping for you. Commented Jun 25, 2020 at 11:13

2 Answers 2

2

Probably you want something like this:

lapply(1:30, function(i) glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))

which will execute 30 logistic regressions with the selected predictor.

Instead of hard coding the overall number of predictors, you can use: sum(grepl('rs', names(mydata))), which will return 30.

You can use tidy function from broom package to get the summary in a tidy format.

purrr::map_dfr(1:30, function(i) data.frame(model = i, tidy(glm(as.formula(paste0('casecontrol ~ ', 'rs', i)), data = mydata, family = binomial))))

or you can do this in a more dynamic way:

names(mydata)[grepl('rs', names(mydata))] -> pred #get all predictors that contain 'rs'

purrr::map_dfr(1:length(pred), 
               function(i) data.frame(model = i, 
                                      tidy(glm(as.formula(paste0('casecontrol ~ ', pred[i])), data = mydata, family = binomial))))

If you want to include another variable, you simply need to adjust the pred vector.

c(pred, paste0(pred, ' + age')) -> pred #interaction between rs drivers and age

or

c(pred, paste0(pred, ' + age + sex')) -> pred #interaction between rs drivers age and sex
Sign up to request clarification or add additional context in comments.

7 Comments

Doing this returns results as follows for each SNP (or rs#s); Call: glm(formula = as.formula(paste0("casecontrol ~ ", "rs", i)), family = binomial, data = mydata) Coefficients: (Intercept) rs1 -0.11215 -0.09693 Degrees of Freedom: 499 Total (i.e. Null); 498 Residual Null Deviance: 687.7 Residual Deviance: 687 AIC: 691 How do I get complete beta estimate, SE, p vlaues summary for each rs?
Thank you, looks great. Would the model work if my rs are not named from 1 to 30 and instead each rs has different name e.g., rs1234_T rs4567_C rs1471633_C rs11264341_T rs2078267_T rs478607_A rs642803_T rs653178_T rs3741414_T rs1394125_A rs6598541_G rs7193778_T rs7188445_A rs7224610_A rs2079742_C rs164009_G rs17050272_A rs2307394_C How do I insert such rs# in the model you suggested please? And where should I add factors to adjust for? like age + sex?
@Tabbi, check my edit. Please accept the answer if this is in line with your expectations.
Thank you. Can you help me with the last bit about adding factors to adjust in the model e.g., age + sex and then I can get the adjusted results with the same command I believe?!
Thank you so very much. You are great!!
|
0

you can do something like this

outcome<-mydata %>% select("casecontrol")  #select outcome

features <- mydata %>% select("rs1":"rs30") #select features

features_names<-data.frame(colnames(features)) #get feature names

for(i in 1:nrow(features_names))     # loop over length of features_name
{selected=features[,i,drop=FALSE]    #get each feature 
total <- cbind(selected, response)   # combine with outcome
glm.fit <- glm(casecontrol ~ ., data = total, family = binomial("logit")) 
summary(glm.fit)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.