8

I want to figure out how to create a loop or using one of the apply functions to get individual 1:1 regression information for each variable in a dataset against the dependent variable.

Lets say I am using mtcars. How would I write in R code that takes each variable in the data frame and regresses it against MPG?

Even better would be getting a summary of each independent variable with and having some sort of name assignment such as x1=, x2=etc

summary(lm(mpg~eachvar,data=mtcars))
1

3 Answers 3

15

This will do it for you.

lapply( mtcars[,-1], function(x) summary(lm(mtcars$mpg ~ x)) )

A data.frame object is a list with some other features so this will go through each column of mtcars excluding the first one and perform the regressions. If you save the resulting list in something like L then you can access each one easily by just using the same name or number as the column in the original data.frame. So L$cyl gives the regression summary for mpg on cyl.

Sign up to request clarification or add additional context in comments.

1 Comment

Actually this one makes more sense. And could also easily do stuff like lapply(L, function(x) x$r.squared) ; lapply(L, coef)
7

A data.table version of Johns solution

library(data.table)
Fits <- 
    data.table(mtcars)[, 
              .(MyFits = lapply(.SD, function(x) summary(lm(mpg ~ x)))), 
              .SDcols = -1]

Some explanations of the code

  • data.table will convert mtcars to a data.table object
  • .SD is also a data.table object which contains the columns one wants to operate on
  • .SDcols = -1 tells .SD not to use first column (as we don't want to fit lm(mpg ~ mpg)
  • lapply just runs the model over all the columns in .SD (except the one we skipped) and returns objects of class list

Fit will a be list of summaries, you can inspect them using

Fits$MyFits

But you can also operate on them, for example, applying coef function on each fit

Fits[, lapply(MyFits, coef)]

Or getting the r.squered

Fits[, lapply(MyFits, `[[`, "r.squared")]

8 Comments

Thanks for this! When I use this solution I get the following error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Any ideas what leads to this error? I want to use this on a rather "dirty" dataset. Could it be that some exceptions are needed? Is it for example possible to add a try statement to this solution to prevent it from blowing up?
It probably means that all of your values are NAs probably. You need to clean your data or use tryCatch. Either way, this answer is old and needs some update.
Thank you for your answer. I thought that at first, but I removed all variables where all values (more than 99%) were NA. For my particular (huge) dataset perhaps it is more likely that there are some non-numerical variables in there? But I guess then tryCatch would still be the solution. I have not used data.table a lot yet. Would it be possible to show me where to incorporate the tryCatch?
You could simply check that the variable is numeric first, e.g. data.table(mtcars)[, .(MyFits = lapply(.SD, function(x) if(is.numeric(x)) summary(lm(mpg ~ x)))), .SDcols = -1]
Thank you, I still have some trouble seeing how I apply statements like that. When I applied your solution to mtcars by the way I get then rows which starts like list(call = lm(formula = mpg ~ x), terms = mpg ~ x, residu.. Was this the intended outcome or is something going wrong there?
|
3

Hi try something like that :

models <- lapply(paste("mpg", names(mtcars)[-1], sep = "~"), formula)
res.models <- lapply(models, FUN = function(x) {summary(lm(formula = x, data = mtcars))})
names(res.models) <- paste("mpg", names(mtcars)[-1], sep = "~")
res.models[["mpg~disp"]]


# Call:
# lm(formula = x, data = mtcars)

# Residuals:
#     Min      1Q  Median      3Q     Max 
# -4.8922 -2.2022 -0.9631  1.6272  7.2305 

# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
# disp        -0.041215   0.004712  -8.747 9.38e-10 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 3.251 on 30 degrees of freedom
# Multiple R-squared:  0.7183,  Adjusted R-squared:  0.709 
# F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

1 Comment

do you know where using poly would fit in with the summary(lm(formula = x...) component?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.