1

I have a dataframe with many variables. I want to apply a linear regression to explain the last one with the others. So as I had to much to write I thought about creating a string with the independent variables e.g. Var1 + Var2 +...+ VarK. I achieved it pasting "+" to all column names except for the last one with this code:

ExVar <- toString(paste(names(datos)[1:11], "+ ", collapse = ''))

I also had to remove the last "+":

ExVar <- substr(VarEx, 1, nchar(ExVar)-2)

So I copied and pasted the ExVar string within the lm() function and the result looked like this:

m1 <- lm(calidad ~ Var1 + Var 2 +...+ Var K)

The question is: Is there any way to use "ExVar" within the lm() function as a string, not as a variable, to have a cleaner code?

For better understanding:

If I use this code:

m1 <- lm(calidad ~ ExVar)

It is interpreting ExVar as a independent variable.

2
  • ExVar <- paste(names(datos)[1:11], collapse = ' + ')) avoids need to remove "+". lm can take a string as the formula. Your last line of code isn't working because the whole formula has to be a string, rather than a combination of formula and string. So if your outcome is the variable in, say, the 12th column, you can do: lm(paste(names(datos)[12], " ~ ", ExVar), data=datos). Or, lm(reformulate(ExVar, names(datos)[12]), data=datos). Or you can use a string directly, like "calidad" instead of names(datos)[12]. Commented Dec 21, 2017 at 17:59
  • In my previous comment, the first bit of code has an extra parenthesis. It should be ExVar <- paste(names(datos)[1:11], collapse = ' + '). Commented Dec 21, 2017 at 18:05

2 Answers 2

2

The following will all produce the same results. I am providing multiple methods because there is are simpler ways of doing what you are asking (see examples 2 and 3) instead of writing the expression as a string.

First, I will generate some example data:

n <- 100
p <- 11
dat <- array(rnorm(n*p),c(n,p))

dat <- as.data.frame(dat)
colnames(dat) <- paste0("X",1:p)

If you really want to specify the model as a string, this example code will help:

ExVar <- toString(paste(names(dat[2:11]), "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste("X1 ~ ",ExVar) 
fit1 <- lm(eval(parse(text = model1)),data = dat)

Otherwise, note that the 'dot' notation will specify all other variables in the model as predictors.

fit2 <- lm(X1 ~ ., data = dat)

Or, you can select the predictors and outcome variables by column, if your data is structured as a matrix.

dat <- as.matrix(dat)
fit3 <- lm(dat[,1] ~ dat[,-1])

All three of these fit objects have the same estimates:

fit1
fit2
fit3
Sign up to request clarification or add additional context in comments.

1 Comment

Or use as.formula().
2

if you have a dataframe, and you want to explain the last one using all the rest then you can use the code below:

 lm(calidad~.,dat)

or you can use

 lm(rev(dat))#Only if the last column is your response variable

Any of the two above will give you the results needed.

To do it your way:

 EXV=as.formula(paste0("calidad~",paste0(names(datos)[-12],collapse = '+')))
 lm(EXV,dat)

There is no need to do it this way since the lm function itself will do this by using the first code above.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.