I want to do something very simple but so far I have failed to do it in one command. I want to create a new data table by applying a function to some columns of an existing one while giving them a name and droppinh the rest. Let's see a minimal example:
library(data.table)
dt = data.table(A = c('a', 'a', 'a', 'b', 'b'),
B = c(1 , 2 , 3 , 4 , 5 ),
C = c(10 , 20 , 30 , 40 , 50))
dt
A B C
a 1 10
a 2 20
a 3 30
b 4 40
b 5 50
For a single column, we can do:
dt1 = dt[, .(totalB = sum(B)), by=A]
dt1
A totalB
a 6
b 9
For more than 1 columns, we can do:
dt2 = dt[, .(totalB = sum(B), totalC = sum(C)), by=A]
dt2
A totalB totalC
a 6 60
b 9 90
But if the columns are many that's not the best practice. So I guess we should go with lapply like that:
dt3 = dt[, lapply(.SD, sum), by = A]
dt3
A B C
a 6 60
b 9 90
That creates the table but without the names. So we can add them:
names = c("totalA", "totalB")
dt4 = dt[, c("totalA", "totalB") := lapply(.SD, sum), by = A ]
dt4
A B C totalA totalB
a 1 10 6 60
a 2 20 6 60
a 3 30 6 60
b 4 40 9 90
b 5 50 9 90
But now the columns remained. How can we prevent that? Also note that in my actual problem I use a subset of the columns, via SDcols, which I didn't include here for simplicity.
EDIT: My desired output is the same as dt2 but I don't want to write down all columns.
[]and select the two columns, you're interested in.