0

I have a dataset that I'll call dataset1 with a predictor variable (e.g. Price). I'm hoping to get a nice single predictor of price based on the n other predictors that exist in the dataset. But if n is large, I can't manually make and examine all these models, so I was hoping to use something like this:

for (i in names(dataset1)) {
    model = lm(Price~i, dataset1)
    # Do stuff here with model, such as analyze R^2 values.
}

(I thought this would work since replacing the inside of the for loop with print(i) results in the correct names.) The error is as follows:

Error in model.frame.default(formula = Price ~ i, data = dataset1, drop.unused.levels =    TRUE) : 
variable lengths differ (found for 'i')

Does anyone have advice for dealing with the problem regarding how R reads in the i variable? I know how to approach this problem using other software, but I would like to get a sense of how R works.

3
  • 2
    curious as to why you don't just try all of the variables in an additive model and trim down the model from there using stepAIC or something similar? Commented Mar 6, 2013 at 21:25
  • This is only one idea I had. I could certainly try other methods; I just wanted to get the hang of understanding this kind of R loop. Commented Mar 6, 2013 at 21:34
  • When I do things like this I paste the formula together and then use do.call, as suggested here: stackoverflow.com/a/7668846/210673 Commented Mar 6, 2013 at 21:35

2 Answers 2

2

I would go for some sort of *apply here personally:

dat <- data.frame(price=1:10,y=10:1,z=1:10)
sapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q)))[2])

 y  z 
-1  1 

or to get a list with full model results:

lapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q))))

$y
            Estimate   Std. Error       t value      Pr(>|t|)
(Intercept)       11 1.137008e-15  9.674515e+15 1.459433e-125
q                 -1 1.832454e-16 -5.457163e+15 1.423911e-123

$z
                Estimate   Std. Error      t value      Pr(>|t|)
(Intercept) 1.123467e-15 2.457583e-16 4.571429e+00  1.822371e-03
q           1.000000e+00 3.960754e-17 2.524772e+16 6.783304e-129

to get the r-squared value as you mentioned:

sapply(dat[2:3], function(q) summary(lm(dat$price ~ q))$r.squared) 
Sign up to request clarification or add additional context in comments.

Comments

1

At the moment you're not cycling through the names. Try

for(i in 2:ncol(dataset1)) #assuming Price is column 1

Then refer to

Price ~ dataset1[, i]

in your loop.

But I'm not sure about your approach from a stats perspective.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.