Making linear models in a for loop using R programming

Question

I have a dataset that I'll call dataset1 with a predictor variable (e.g. Price). I'm hoping to get a nice single predictor of price based on the n other predictors that exist in the dataset. But if n is large, I can't manually make and examine all these models, so I was hoping to use something like this:

for (i in names(dataset1)) {
    model = lm(Price~i, dataset1)
    # Do stuff here with model, such as analyze R^2 values.
}

(I thought this would work since replacing the inside of the for loop with print(i) results in the correct names.) The error is as follows:

Error in model.frame.default(formula = Price ~ i, data = dataset1, drop.unused.levels =    TRUE) : 
variable lengths differ (found for 'i')

Does anyone have advice for dealing with the problem regarding how R reads in the i variable? I know how to approach this problem using other software, but I would like to get a sense of how R works.

curious as to why you don't just try all of the variables in an additive model and trim down the model from there using stepAIC or something similar? — tcash21
– tcash21, Commented Mar 6, 2013 at 21:25
This is only one idea I had. I could certainly try other methods; I just wanted to get the hang of understanding this kind of R loop. — TakeS
– TakeS, Commented Mar 6, 2013 at 21:34
When I do things like this I paste the formula together and then use do.call, as suggested here: stackoverflow.com/a/7668846/210673 — Aaron - mostly inactive
– Aaron - mostly inactive, Commented Mar 6, 2013 at 21:35

user1317221_G · Accepted Answer · 2013-03-06 21:36:27Z

2

I would go for some sort of *apply here personally:

dat <- data.frame(price=1:10,y=10:1,z=1:10)
sapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q)))[2])

 y  z 
-1  1

or to get a list with full model results:

lapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q))))

$y
            Estimate   Std. Error       t value      Pr(>|t|)
(Intercept)       11 1.137008e-15  9.674515e+15 1.459433e-125
q                 -1 1.832454e-16 -5.457163e+15 1.423911e-123

$z
                Estimate   Std. Error      t value      Pr(>|t|)
(Intercept) 1.123467e-15 2.457583e-16 4.571429e+00  1.822371e-03
q           1.000000e+00 3.960754e-17 2.524772e+16 6.783304e-129

to get the r-squared value as you mentioned:

sapply(dat[2:3], function(q) summary(lm(dat$price ~ q))$r.squared)

edited Mar 6, 2013 at 21:36

answered Mar 6, 2013 at 21:30

user1317221_G

15.5k3 gold badges54 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

alexwhan · Accepted Answer · 2013-03-06 21:29:37Z

1

At the moment you're not cycling through the names. Try

for(i in 2:ncol(dataset1)) #assuming Price is column 1

Then refer to

Price ~ dataset1[, i]

in your loop.

But I'm not sure about your approach from a stats perspective.

edited Mar 6, 2013 at 21:29

answered Mar 6, 2013 at 21:24

alexwhan

16.1k7 gold badges54 silver badges67 bronze badges

Collectives™ on Stack Overflow

Making linear models in a for loop using R programming

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related