1

I want to loop through the vars in a dataframe, calling lm() on each one, and so I wrote this:

findvars <- function(x = samsungData, dv = 'activity', id = 'subject') {
  # Loops through the possible predictor vars, does an lm() predicting the dv
  # from each, and returns a data.frame of coefficients, one row per IV.
  r <- data.frame()
  # All varnames apart from the dependent var, and the case identifier
  ivs <- setdiff(names(x), c(dv, id))
  for (iv in ivs) {
    print(paste("trying", iv))
    m <- lm(dv ~ iv, data = x, na.rm = TRUE)
    # Take the absolute value of the coefficient, then transpose.
    c <- t(as.data.frame(sapply(m$coefficients, abs)))
    c$iv <- iv # which IV produced this row?
    r <- c(r, c)
  }
  return(r)
}

This doesn't work, I believe b/c the formula in the lm() call consists of function-local variables that hold strings naming vars in the passed-in dataframe (e.g., "my_dependant_var" and "this_iv") as opposed to pointers to the actual variable objects.

I tried wrapping that formula in eval(parse(text = )), but could not get that to work.

If I'm right about the problem, can someone explain to me how to get R to resolve the contents of those vars iv & dv into the pointers I need? Or if I'm wrong, can someone explain what else is going on?

Many thanks!

Here is some repro code:

library(datasets)
data(USJudgeRatings)
findvars(x = USJudgeRatings, dv = 'CONT', id = 'DILG')
2
  • ?reformulate (you can search SO for that keyword) Commented Dec 4, 2013 at 4:07
  • In R you should forget about "pointers to the objects". Values are passed ... as values. And the "variables" in formulas are not really "strings". Names and symbols are objects of super-class "language". Character vectors are not. Commented Dec 4, 2013 at 5:45

1 Answer 1

3

So there's enough bad stuff happening in your function besides your trouble with the formula, that I think someone should walk you through it all. Here are some annotations, followed by a better version:

  #For small examples, "growing" objects isn't a huge deal,
  # but you will regret it very, very quickly. It's a bad
  # habit. Learn to ditch it now. So don't inititalize
  # empty lists and data frames.
  r <- data.frame()

  ivs <- setdiff(names(x), c(dv, id))
  for (iv in ivs) {
    print(paste("trying", iv))
    #There is no na.rm argument to lm, only na.action
    m <- lm(dv ~ iv, data = x, na.rm = TRUE)
    #Best not to name variables c, its a common function, see two lines from now!
    # Also, use the coef() extractor functions, not $. That way, if/when
    # authors change the object structure your code won't break.
    #Finally, abs is vectorized, no need for sapply
    c <- t(as.data.frame(sapply(m$coefficients, abs)))
    #This is probably best stored in the name
    c$iv <- iv # which IV produced this row?
    #Growing objects == bad! Also, are you sure you know what happens when
    # you concatenate two data frames?
    r <- c(r, c)
  }
  return(r)
}

Try something like this instead:

findvars <- function(x,dv,id){
  ivs <- setdiff(names(x),c(dv,id))
  #initialize result list of the appropriate length
  result <- setNames(vector("list",length(ivs)),ivs)
  for (i in seq_along(ivs)){
    result[[i]] <- abs(coef(lm(paste(dv,ivs[i],sep = "~"),data = x,na.action = na.omit)))
  }
  result
}
Sign up to request clarification or add additional context in comments.

1 Comment

Holy cow--thank you so much for breaking all that that down! That of course works, and I have some inkling why it works--at least I think so. I'm comforted to know that lm() will take a string for the formula argument--I guess part of my problem was wrapping my paste() call in eval()? Or something. At any rate--thanks for being so generous w/your time!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.