Reference a variable inside select function in R

Question

Suppose I have the following function that takes in the parameter var_name. var_name refers to the name of a variable in the data frame. Now consider the following function:

library(dplyr) 
calculate_mean <- function(data, var_name) {
    lapply(select(data, var_name), mean, na.rm=TRUE)
}

However, I am getting the error:

Error: All select() inputs must resolve to integer column positions.
    The following do not: *  var_name

Pierre L · Accepted Answer · 2015-11-22 04:30:45Z

2

df <- head(iris)

f <- function(data, var_name) {
  select(data, var_name)
}

f(df, "Petal.Width")
#Error: All select() inputs must resolve to integer column positions.
#The following do not:
#*  var_name

The author of that package tends to write optional versions of functions that accept character strings as arguments. Try adding an underscore to the function:

f2 <- function(data, var_name) {
  select_(data, var_name)
}

f2(df, "Petal.Width")
#  Petal.Width
#1         0.2
#2         0.2
#3         0.2
#4         0.2
#5         0.2
#6         0.4

Further Explanation Usually an unquoted string is considered a variable. If we try x in the console, the evaluator will search the environment for a variable with that name. When used with a function the same search will occur. With mean(x) the variable x must be defined.

This behavior can become confusing when the function is written to not search for a variable. It is called, non-standard evaluation, NSE. There is a base R function that uses NSE. subset(df, select= -Petal.Width) returns the data frame without Petal.Width. This convenience makes for easier programming. select was designed in a similar way.

When you created your function it evaluated in a standard way; unquoted arguments were considered variables. But you are using it for an NSE function select. That function will look for var_name even though you were expecting it to be replaced by the user's input. Let's demonstrate the behavior by creating a literal var_name column:

df$var_name <- 1
f(df, "Petal.Width")
  var_name
1        1
2        1
3        1
4        1
5        1
6        1

The original function with select returned the column var_name, not the column we hoped for. Hadley Wickham created select_ in part, to anticipate this discrepancy.

For more information on NSE http://adv-r.had.co.nz/Computing-on-the-language.html

edited Nov 22, 2015 at 4:30

answered Nov 22, 2015 at 4:05

Pierre L

28.5k6 gold badges50 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Neel Over a year ago

What's the difference between select() and select_() ?

Rich Scriven Over a year ago

@Neel - One uses non-standard evaluation (select()), the other standard (select_()).

Collectives™ on Stack Overflow

Reference a variable inside select function in R

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related