1

This question is an extention of this question: Apply multiple functions to multiple columns in data.table. Given a data.table

DT <- data.table("a"=1:5,
                 "b"=2:6,
                 "c"=c(1, 1, 2, 2, 2))

I want to apply a list of functions to a and b grouping by c. If I don't group by c I get the expected result:

my.summary = function(x) list(mean = mean(x), median = median(x))
DT[, unlist(lapply(.SD, my.summary)), .SDcols = c("a", "b")]
# a.mean a.median   b.mean b.median 
#       3        3        4        4 

When doing the same operation, but grouping by c, I expected to get

 c a.mean a.median   b.mean b.median 
 1   1.5      1.5      2.5      2.5 
 2    4        4        5        5 

but instead I got

DT[, unlist(lapply(.SD, my.summary)), by = c, .SDcols = c("a", "b")]
   c  V1
1: 1 1.5
2: 1 1.5
3: 1 2.5
4: 1 2.5
5: 2 4.0
6: 2 4.0
7: 2 5.0
8: 2 5.0

It seems like the data has been melt, without a way to know which function has been applied (unless you know the order in my.summary. Any suggestions on how to solve this?

3
  • You may wrap your j in as.list. See the "For the more general case" here: Calculate multiple aggregations with lapply(.SD, …) Commented Jul 31, 2020 at 10:44
  • Thank you! Should I delete the question since there already exist a similar question? Commented Jul 31, 2020 at 11:22
  • No, just keep it here as a signpost for future visitors. And thanks for posting a small example and sharing your research and code attempts. Cheers. Commented Jul 31, 2020 at 11:34

1 Answer 1

3

First you need to change your function. data.table expects consistent types and median can return integer or double values depending on input.

my.summary <- function(x) list(mean = mean(x), median = as.numeric(median(x)))

Then you need to ensure that only the first level of the nested list is unlisted. The result of the unlist call still needs to be a list (remember, a data.table is a list of column vectors).

DT[, unlist(lapply(.SD, my.summary), recursive = FALSE), by = c, .SDcols = c("a", "b")]
#   c a.mean a.median b.mean b.median
#1: 1    1.5      1.5    2.5      2.5
#2: 2    4.0      4.0    5.0      5.0
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.