Linear Regression loop for each independent variable individually against dependent

Question

I want to figure out how to create a loop or using one of the apply functions to get individual 1:1 regression information for each variable in a dataset against the dependent variable.

Lets say I am using mtcars. How would I write in R code that takes each variable in the data frame and regresses it against MPG?

Even better would be getting a summary of each independent variable with and having some sort of name assignment such as x1=, x2=etc

summary(lm(mpg~eachvar,data=mtcars))

A non-standard approach for this problem: Fast pairwise simple linear regression between variables in a data frame. The general_paired_simpleLM could be useful when all your variables are numeric. — Zheyuan Li
– Zheyuan Li, Commented Aug 27, 2018 at 1:50

John · Accepted Answer · 2014-07-30 12:28:25Z

15

This will do it for you.

lapply( mtcars[,-1], function(x) summary(lm(mtcars$mpg ~ x)) )

A data.frame object is a list with some other features so this will go through each column of mtcars excluding the first one and perform the regressions. If you save the resulting list in something like L then you can access each one easily by just using the same name or number as the column in the original data.frame. So L$cyl gives the regression summary for mpg on cyl.

answered Jul 30, 2014 at 12:28

John

23.8k7 gold badges60 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Arenburg Over a year ago

Actually this one makes more sense. And could also easily do stuff like lapply(L, function(x) x$r.squared) ; lapply(L, coef)

David Arenburg · Accepted Answer · 2018-08-09 13:13:38Z

7

A data.table version of Johns solution

library(data.table)
Fits <- 
    data.table(mtcars)[, 
              .(MyFits = lapply(.SD, function(x) summary(lm(mpg ~ x)))), 
              .SDcols = -1]

Some explanations of the code

data.table will convert mtcars to a data.table object
.SD is also a data.table object which contains the columns one wants to operate on
.SDcols = -1 tells .SD not to use first column (as we don't want to fit lm(mpg ~ mpg)
lapply just runs the model over all the columns in .SD (except the one we skipped) and returns objects of class list

Fit will a be list of summaries, you can inspect them using

Fits$MyFits

But you can also operate on them, for example, applying coef function on each fit

Fits[, lapply(MyFits, coef)]

Or getting the r.squered

Fits[, lapply(MyFits, `[[`, "r.squared")]

edited Aug 9, 2018 at 13:13

answered Jul 30, 2014 at 11:59

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

8 Comments

Tom Over a year ago

Thanks for this! When I use this solution I get the following error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Any ideas what leads to this error? I want to use this on a rather "dirty" dataset. Could it be that some exceptions are needed? Is it for example possible to add a try statement to this solution to prevent it from blowing up?

David Arenburg Over a year ago

It probably means that all of your values are NAs probably. You need to clean your data or use tryCatch. Either way, this answer is old and needs some update.

Tom Over a year ago

Thank you for your answer. I thought that at first, but I removed all variables where all values (more than 99%) were NA. For my particular (huge) dataset perhaps it is more likely that there are some non-numerical variables in there? But I guess then tryCatch would still be the solution. I have not used data.table a lot yet. Would it be possible to show me where to incorporate the tryCatch?

David Arenburg Over a year ago

You could simply check that the variable is numeric first, e.g. data.table(mtcars)[, .(MyFits = lapply(.SD, function(x) if(is.numeric(x)) summary(lm(mpg ~ x)))), .SDcols = -1]

Tom Over a year ago

Thank you, I still have some trouble seeing how I apply statements like that. When I applied your solution to mtcars by the way I get then rows which starts like list(call = lm(formula = mpg ~ x), terms = mpg ~ x, residu.. Was this the intended outcome or is something going wrong there?

|

Victorp · Accepted Answer · 2014-07-30 11:55:55Z

3

Hi try something like that :

models <- lapply(paste("mpg", names(mtcars)[-1], sep = "~"), formula)
res.models <- lapply(models, FUN = function(x) {summary(lm(formula = x, data = mtcars))})
names(res.models) <- paste("mpg", names(mtcars)[-1], sep = "~")
res.models[["mpg~disp"]]


# Call:
# lm(formula = x, data = mtcars)

# Residuals:
#     Min      1Q  Median      3Q     Max 
# -4.8922 -2.2022 -0.9631  1.6272  7.2305 

# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
# disp        -0.041215   0.004712  -8.747 9.38e-10 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 3.251 on 30 degrees of freedom
# Multiple R-squared:  0.7183,  Adjusted R-squared:  0.709 
# F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

answered Jul 30, 2014 at 11:55

Victorp

13.9k2 gold badges53 silver badges56 bronze badges

1 Comment

Harmzy15 Over a year ago

do you know where using poly would fit in with the summary(lm(formula = x...) component?

Collectives™ on Stack Overflow

Linear Regression loop for each independent variable individually against dependent

3 Answers 3

1 Comment

8 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related