0

Apologies if this is a duplicate. I am very new to data.table, and have seen very similar questions asked here but none that exactly answered my question.

I would like to find a concise syntax to aggregate multiple columns of a data.table with the same aggregation function, with customized names of the resulting aggregated columns.

setup

library(data.table)
data(mtcars)
setDT(mtcars)

If I call

mtcars[, lapply(.SD, sum, na.rm = TRUE), by = .(am, gear), .SDcols = c('mpg','cyl')]

The result is

   am gear   mpg cyl
1:  1    4 210.2  36
2:  0    3 241.6 112
3:  0    4  84.2  20
4:  1    5 106.9  30

This is great but I want the last two columns to be called by customized names that I define ahead of time.

I can achieve the desired result with

mtcars[, .(sum_of_mpg = sum(mpg, na.rm = TRUE), sum_of_cyl = sum(cyl, na.rm = TRUE)), by = .(am, gear)]

This results in

  am gear sum_of_mpg sum_of_cyl
1:  1    4      210.2         36
2:  0    3      241.6        112
3:  0    4       84.2         20
4:  1    5      106.9         30

But this result cannot be generalized to allow me to define the custom names beforehand.

I've tried the code below and various variants of it, but nothing gives this result in one step.

custom_names <- c('sum_of_mpg','sum_of_cyl')
mtcars[, (custom_names) = lapply(.SD, sum, na.rm = TRUE), by = .(am, gear), .SDcols = c('mpg','cyl')]

Is there a way to do this concisely? This is necessary because the code may be embedded in a function and may need to work on an indefinite number of columns.

2 Answers 2

1

Here is an available solution

in_names <- c('mpg','cyl')
custom_names <- c('sum_of_mpg','sum_of_cyl')

mtcars[, lapply(.SD, sum, na.rm = TRUE), by = .(am, gear), .SDcols = in_names][
,setnames(.SD,in_names,custom_names)][]

Another a little complex solution you can have try

mtcars[,as.list(unlist(lapply(.SD, function(x)
               list(sum=sum(x))))),
               by = .(am,gear),
               .SDcols = in_names]

Improved solution

mtcars[, sapply(.SD, function(x) list(sum = sum(x))),
  .SDcols = in_names,
  by = .(am, gear)]
Sign up to request clarification or add additional context in comments.

2 Comments

I like the "pipe" style here, I guess it isn't possible to assign the names in the first bracketed statement though?
I think it's possible but I fail to realize it after several failure. Luckily, I finally get a satifying solution. Please see the edited answer.
1

Single [-call using .SDcols and setNames within the lapply:

cols <- c('mpg','cyl')
mtcars[, lapply(setNames(.SD, paste0("sum_of_", cols)), sum, na.rm = TRUE),
       by = .(am, gear), .SDcols = cols]
#       am  gear sum_of_mpg sum_of_cyl
#    <num> <num>      <num>      <num>
# 1:     1     4      210.2         36
# 2:     0     3      241.6        112
# 3:     0     4       84.2         20
# 4:     1     5      106.9         30

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.