Data tables in R have three (main) components: DT[i, j, by].
I am creating subsets of my data.table DT using the by functionality, which returns subsets of my data to j, where I can perform operations on them. I within each of the new subsets, I can specify the columns I want to use in j.
From the documentation (slightly altered by me):
DT[, lapply(.SD, mean), by=., .SDcols=...]- applies fun (=mean) to all columns specified in .SDcols while grouping by the columns specified in by.
This is great functionality!
I would like to know if it is possible to supply arguments to the function being used in j - in this case: mean?
The function mean can take the following inputs:
mean(x, trim = 0, na.rm = FALSE, ...)
How can I use mean within the j section AND apply, for example, na.rm = TRUE?
On a side note, I did have a similar problem regarding the Reduce
function, which applied a functions to a data sets recursiely. The best idea I found was to create a custom version of the function to apply, so something like:
my_mean <- function(Data) {
output <- mean(Data, na.rm = TRUE)
return(output)
}
then using the example above, I would perform:
DT[, lapply(.SD, my_mean), by=., .SDcols=...]