1

I am quite new to R coding, thus I really need your help to run a looping command in R.

I have a big table ("variable_table.txt") with columns as below:

sample  BMI  var1_LRR   var1_BAF    var2_LRR    var2_BAF    var3_LRR var3_BAF ........ var200_LRR var200_BAF

AA     18.9    0.27       0.99        0.18        0.99        0.11         1  ........   0.20        0.99

BB     27.1    0.23       1           0.13        0.99        0.17         1  ........   0.23        0.99

I would like to run a regression command as below:

dataset<- read.table ("variable_table.txt", na.strings="NA", header=TRUE)

linear_var1 <- lm (BMI ~ var1_LRR + var1_BAF,data=dataset)

summary(linear_var1)

confint_var1_CI <- confint(linear _var1, level=0.95)

confint_var_CI

Question 1: Can someone help me how can I do the above commands, and repeat them again using the next variable (from var1 to var2, then to var3, until var200) without having to run it individually.

Question 2: How to compile each run result into one compiled table?

3 Answers 3

1

The easiest way would be to subset your data.frame, e.g.

mydata <- data.frame(y = runif(100),
                     foo1 = runif(100), bar1 = runif(100),
                     foo2 = runif(100), bar2 = runif(100))

out <- list()

for (i in 1:2)
  out[[i]] <- lm(y ~., data = mydata[, c("y", paste(c("foo", "bar"), i, sep=""))])

As about saving output to a table, first you have to decide what part of output you want to save (e.g. coefficients)

mytab <- matrix(NA, 2, 3)
for (i in 1:2)
  mytab[i, ] <- out[[i]]$coefficients

You can also use broom library to extract "tidy" output from lm objects.

library(broom)
tidy(out[[1]])
##          term   estimate  std.error statistic           p.value
## 1 (Intercept)  0.5060922 0.07619095  6.642419 0.000000001794162
## 2        foo1 -0.1567166 0.10023700 -1.563461 0.121201059993118
## 3        bar1  0.1578192 0.10404012  1.516907 0.132542574934363

next, you could combine those outputs using rbind.

Sign up to request clarification or add additional context in comments.

Comments

0

You might try something like this:

for ( i in 1:200 ) {

  # build the formula
  form<-as.formula(paste("BMI ~ **var", i, "**_LRR + **var", i, "**_BAF", sep=""))

  # make a character string with the lm-instruction, using the formula above
  code.lm<-paste("lm.V", i, "<-lm(form, data=dataset)", sep="")
  # dynamically execute the code in that string
  eval(parse(text=code.lm))

  # create a string xith the summary code
  code.summ<-paste("summary(lm.V", i, ")", sep="")
  # dynamically execute the string
  eval(parse(text=code.summ))

}

I did it up to the 'summary' instruction, but the rest is similar: you 'paste' your code in a character string and then execute it with 'eval(parse(text=))'.

After this you can acces the variables 'lm.V1', ... 'lm.V200'

Comments

0

You'll have a much easier time working with the data frame if you rearrange it first:

library(tidyr)
# gather all columns into a single column
tidied <- gather(dataset, var, value, -sample, -BMI)

# separate the "var" column into varnum (var1, var2...) and variable
tidied <- separate(tidied, var, c("var1", "variable"))

# now spread the two variables (BAF and LRR) back across columns
tidied <- spread(tidied, variable, value)

You'll end up with a table x that has five columns: sample, BMI, var (which is var1, var2, etc), LRR, and BAF. It will have 200 times as many rows as your current table. Note that with the %>% operator, you can do the above steps as:

library(dplyr)
tidied <- dataset %>%
  gather(var, value, -sample, -BMI) %>%
  separate(var, c("var", "variable")) %>%
  spread(variable, value)

Once you've done that rearrangement, you can very easily perform a linear regression within each var using dplyr's group_by and do, along with broom:

library(broom)
coefs <- tidied %>%
  group_by(var) %>%
  do(tidy(lm(BMI ~ BAF + LRR, data = .), conf.int = TRUE))

For example, if your dataset were:

set.seed(1)
dataset <- data.frame(sample = 1:100, BMI = rnorm(100),
                      var1_LRR = rnorm(100), var1_BAF = runif(100),
                      var2_LRR = rnorm(100), var2_BAF = runif(100),
                      var3_LRR = rnorm(100), var3_BAF = runif(100))

The results of the above code would be:

Source: local data frame [9 x 8]
Groups: var

   var        term      estimate  std.error    statistic   p.value    conf.low conf.high
1 var1 (Intercept)  0.1298513867 0.17715588  0.732978145 0.4653394 -0.22175399 0.4814568
2 var1         BAF -0.0415096698 0.30068830 -0.138048836 0.8904880 -0.63829271 0.5552734
3 var1         LRR  0.0001270982 0.09550805  0.001330759 0.9989409 -0.18942994 0.1896841
4 var2 (Intercept)  0.1064316834 0.18173583  0.585639517 0.5594779 -0.25426363 0.4671270
5 var2         BAF  0.0144181386 0.31656921  0.045544981 0.9637666 -0.61388410 0.6427204
6 var2         LRR -0.0470190629 0.09340229 -0.503403723 0.6158217 -0.23239676 0.1383586
7 var3 (Intercept)  0.0616288934 0.17865709  0.344956329 0.7308741 -0.29295597 0.4162138
8 var3         BAF  0.1045320710 0.31246736  0.334537572 0.7386962 -0.51562914 0.7246933
9 var3         LRR  0.1118595808 0.07714709  1.449952134 0.1502976 -0.04125603 0.2649752

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.