2

I'm trying to better understant the data.table package in r. I want to do different types of calculation with some columns and assign the result to new columns with specific names. Here is an example:


set.seed(122)
df <- data.frame(rain = rep(5,10),temp=1:10, skip = sample(0:2,10,T),
                 windw_sz = sample(1:2,10,T),city =c(rep("a",5),rep("b",5)),ord=rep(sample(1:5,5),2)) 


df <- as.data.table(df)
vars <- c("rain","temp")

df[, paste0("mean.",vars) := lapply(mget(vars),mean), by="city" ]

This works just fine. But now I also want to calculate the sum of these variables, so I try:

df[, c(paste0("mean.",vars), paste("sum.",vars)) := list( lapply(mget(vars),mean),
                                                          lapply(mget(vars),sum)), by="city" ]

and I get an error.

How could I implement this last part?

Thanks a lot!

1 Answer 1

2

Instead of list wrap, we can do a c as the lapply output is a list, and when do list as wrapper, it returns a list of list. However, with c, it concats two list end to end (i.e. c(as.list(1:5), as.list(6:10)) as opposed to list(as.list(1:5), as.list(6:10))) and instead of mget, make use of .SDcols

library(data.table)
df[, paste0(rep(c("mean.", "sum."), each = 2),  vars) := 
       c(lapply(.SD, mean), lapply(.SD, sum)), by = .(city), .SDcols = vars]
df
#    rain temp skip windw_sz city ord mean.rain mean.temp sum.rain sum.temp
# 1:    5    1    0        2    a   2         5         3       25       15
# 2:    5    2    1        1    a   5         5         3       25       15
# 3:    5    3    2        2    a   3         5         3       25       15
# 4:    5    4    2        1    a   4         5         3       25       15
# 5:    5    5    2        2    a   1         5         3       25       15
# 6:    5    6    0        1    b   2         5         8       25       40
# 7:    5    7    2        2    b   5         5         8       25       40
# 8:    5    8    1        2    b   3         5         8       25       40
# 9:    5    9    2        1    b   4         5         8       25       40
#10:    5   10    2        2    b   1         5         8       25       40
Sign up to request clarification or add additional context in comments.

1 Comment

Great, thanks a lot. Only two clarifications: (1) why did you suggest me to use .SDcols instead of mget? Is it efficiency only? mget also seems to work, right? (2) why did you wrapping the by argument with a list - .(city) - using only "city" also seems to work to me. Thanks a lot for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.