1

Data tables in R have three (main) components: DT[i, j, by].

I am creating subsets of my data.table DT using the by functionality, which returns subsets of my data to j, where I can perform operations on them. I within each of the new subsets, I can specify the columns I want to use in j.

From the documentation (slightly altered by me):

DT[, lapply(.SD, mean), by=., .SDcols=...] - applies fun (=mean) to all columns specified in .SDcols while grouping by the columns specified in by.

This is great functionality!

I would like to know if it is possible to supply arguments to the function being used in j - in this case: mean?

The function mean can take the following inputs:

mean(x, trim = 0, na.rm = FALSE, ...)

How can I use mean within the j section AND apply, for example, na.rm = TRUE?


On a side note, I did have a similar problem regarding the Reduce function, which applied a functions to a data sets recursiely. The best idea I found was to create a custom version of the function to apply, so something like:

my_mean <- function(Data) {

    output <- mean(Data, na.rm = TRUE)

    return(output)
}

then using the example above, I would perform:

DT[, lapply(.SD, my_mean), by=., .SDcols=...]

1 Answer 1

5

you can pass the extra arguments into lapply:

DT = data.table(x=c(1,2,3,4,NA),y=runif(5),z=c(1,1,1,2,2))
DT[, lapply(.SD, mean,na.rm=T), by=z]
Sign up to request clarification or add additional context in comments.

2 Comments

Worth noting that this has nothing to do with data.table; see ?lapply.
It works similarly with Reduce, sending the values in each group defined by by.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.