I think this is what you're looking for. The easiest way to refer to columns of a data frame functionally is to use quoted column names. In principle, what you're doing is this
data[, "weight"] / data[, "height"]^2
but inside a function you might want to let the user specify that the height or weight column is named differently, so you can write your function
add_bmi = function(data, height_col = "height", weight_col = "weight") {
data$bmi = data[, weight_col] / data[, height_col]
return(data)
}
This function will assume that the columns to use are named "height" and "weight" by default, but the user can specify other names if necessary. You could do a similar solution using column indices instead, but using names tends to be easier to debug.
Functions this simple are rarely useful. If you're calculating BMI for a lot of datasets maybe it is worth keeping this function around, but since it is a one-liner in base R you probably don't need it.
my_data$BMI = with(my_data, weight / height^2)
One note is that using column names stored in variables means you can't use $. This is the price we pay by making things more programmatic, and it's a good habit to form for such applications. See fortunes::fortune(343):
Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R newbie, think of
R as your bank account: overuse of $-extraction can lead to undesirable consequences. It's best to
acquire the '[[' and '[' habit early.
-- Peter Ehlers (about the use of $-extraction)
R-help (March 2013)
For fancier usage like dplyr does where you don't have to quote column names and such (and can evaluate expressions), the lazyeval package makes things relatively painless and has very nice vignettes.
The base function with can be used to do some lazy evaluating, e.g.,
with(mtcars, plot(disp, mpg))
# sometimes with is nice
plot(mtcars$disp, mtcars$mpg)
but with is best used interactively and in straightforward scripts. If you get into writing programmatic production code (e.g., your own R package), it's safer to avoid non-standard evaluation. See, for example, the warning in ?subset, another base R function that uses non-standard evaluation.
dplyrorlapplyto add a BMI column, you can just dodata$BMI = data$weight / data$height^2.data[, 2] / data[, 3]^2, by quoted namedata[, "weight"] / data[, "height"]^2. For both of these methods you could have the user input optional arguments to the function to specify either the column index or the quoted name of the columns to use.myfunfor construction of the column and use it withdata$mynewcol <- with(data,myfun(weight,height,other_col))