Apply multiple functions to multiple columns in data.table by group [duplicate]

Question

This question is an extention of this question: Apply multiple functions to multiple columns in data.table. Given a data.table

DT <- data.table("a"=1:5,
                 "b"=2:6,
                 "c"=c(1, 1, 2, 2, 2))

I want to apply a list of functions to a and b grouping by c. If I don't group by c I get the expected result:

my.summary = function(x) list(mean = mean(x), median = median(x))
DT[, unlist(lapply(.SD, my.summary)), .SDcols = c("a", "b")]
# a.mean a.median   b.mean b.median 
#       3        3        4        4

When doing the same operation, but grouping by c, I expected to get

 c a.mean a.median   b.mean b.median 
 1   1.5      1.5      2.5      2.5 
 2    4        4        5        5

but instead I got

DT[, unlist(lapply(.SD, my.summary)), by = c, .SDcols = c("a", "b")]
   c  V1
1: 1 1.5
2: 1 1.5
3: 1 2.5
4: 1 2.5
5: 2 4.0
6: 2 4.0
7: 2 5.0
8: 2 5.0

It seems like the data has been melt, without a way to know which function has been applied (unless you know the order in my.summary. Any suggestions on how to solve this?

You may wrap your j in as.list. See the "For the more general case" here: Calculate multiple aggregations with lapply(.SD, …) — Henrik
– Henrik, Commented Jul 31, 2020 at 10:44
Thank you! Should I delete the question since there already exist a similar question? — J.C.Wahl
– J.C.Wahl, Commented Jul 31, 2020 at 11:22
No, just keep it here as a signpost for future visitors. And thanks for posting a small example and sharing your research and code attempts. Cheers. — Henrik
– Henrik, Commented Jul 31, 2020 at 11:34

Roland · Accepted Answer · 2020-07-31 10:45:03Z

3

First you need to change your function. data.table expects consistent types and median can return integer or double values depending on input.

my.summary <- function(x) list(mean = mean(x), median = as.numeric(median(x)))

Then you need to ensure that only the first level of the nested list is unlisted. The result of the unlist call still needs to be a list (remember, a data.table is a list of column vectors).

DT[, unlist(lapply(.SD, my.summary), recursive = FALSE), by = c, .SDcols = c("a", "b")]
#   c a.mean a.median b.mean b.median
#1: 1    1.5      1.5    2.5      2.5
#2: 2    4.0      4.0    5.0      5.0

answered Jul 31, 2020 at 10:45

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Apply multiple functions to multiple columns in data.table by group [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related