2

I want to create several variables using a formula with R data.table. I have a list of variables, and for each one I want to perform a calculation and create a new variable, pasting the same string onto each column name. I can get it to work for one variable at a time, but it doesn't work for a lapply or a loop. I suspect I am missing something with R data.table and quotation marks or variable names vs. strings. Do I need to use ".." or wrap with eval()? A dplyr (or any tidyverse) solution would solve the issue too.

Here is example code with mtcars:

library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")

# Doesn't work:
for( myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}

# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])

# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]

# Doesn't work
for (myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              myVar - 
              lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}

# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") := 
                                       x - 
                                       lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])

# Works
mtcars.dt[, mpg.disp.lm.adj := 
            mpg - 
            lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]

For the ratio calculation, I get the following error:

Error in myVar/disp : non-numeric argument to binary operator 

For the lm adjustment, I get the following error:

Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'disp')

1 Answer 1

2

We can use get

library(data.table)
for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := get(myVar) / disp]
  }

Or wrap with eval after converting to symbol

for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := eval(as.name(myVar)) / disp]
  }

Or another option is to specify in .SDcols, loop over the .SD (Subset of Data.table, do the transformation and create the new variables by assignment (:=)

mtcars.dt[, paste0(myVars, ".disp.ratio") := lapply(.SD, `/`, disp), 
             .SDcols = myVars]

For the second case, we can create the formula with paste

for (myVar in myVars) {
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              get(myVar) - 
              lm(data = .SD, formula = paste(myVar,  "~ disp"))$coefficients[2] *
               (disp - mean(disp))]
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.