1

I want to do something very simple but so far I have failed to do it in one command. I want to create a new data table by applying a function to some columns of an existing one while giving them a name and droppinh the rest. Let's see a minimal example:

library(data.table)
dt = data.table(A = c('a', 'a', 'a', 'b', 'b'),
                B = c(1  , 2  , 3  , 4  , 5  ),
                C = c(10 , 20 , 30 , 40 , 50))
dt
A   B   C
a   1   10
a   2   20
a   3   30
b   4   40
b   5   50

For a single column, we can do:

dt1 = dt[, .(totalB = sum(B)), by=A]
dt1
A   totalB
a   6
b   9

For more than 1 columns, we can do:

dt2 = dt[, .(totalB = sum(B), totalC = sum(C)), by=A]
dt2
A   totalB   totalC
a   6        60
b   9        90

But if the columns are many that's not the best practice. So I guess we should go with lapply like that:

dt3 = dt[, lapply(.SD, sum), by = A]
dt3
A   B   C
a   6   60
b   9   90

That creates the table but without the names. So we can add them:

names = c("totalA", "totalB")
dt4 = dt[, c("totalA", "totalB") := lapply(.SD, sum), by = A ]
dt4
A   B   C   totalA  totalB
a   1   10  6   60
a   2   20  6   60
a   3   30  6   60
b   4   40  9   90
b   5   50  9   90

But now the columns remained. How can we prevent that? Also note that in my actual problem I use a subset of the columns, via SDcols, which I didn't include here for simplicity.

EDIT: My desired output is the same as dt2 but I don't want to write down all columns.

2
  • Why not just selecting the relevant columns in your dt4 creation, you can add an additional [] and select the two columns, you're interested in. Commented Jan 16, 2021 at 18:57
  • @hannes101 I am actually intested in plenty of columns though. So not that simple. Plus, ituitively, there should be a way to do what I want given that it's so close. Commented Jan 16, 2021 at 19:03

2 Answers 2

0

Do you mean something like below?

dt[, setNames(lapply(.SD, sum), paste0("total", names(.SD))), A]
  • Output
   A totalB totalC
1: a      6     60
2: b      9     90
Sign up to request clarification or add additional context in comments.

4 Comments

Sorry but no. I think that's actually the same as my dt4. Also, I will not be using totalB etc so I definitely want the names vector.
@cgss What's your desired output? Please show an example
My desired output is the same as dt2 but if I have like 10 columns writing `.(name1 = sum(col1), ... , name10=sum(col10)) looks kinda bad. I will edit the question to add this.
Your update works in thw mwe. It also works with my names variable, instead of the paste function. Strangely enough, it results in the following error in my actual project: setnames(lapply(.SD, sum), names): x is not a data.table or data.frame Clearly, it is a dataframe.
0

Another option is setnames. Create a vector of column names that we want to apply the function other than the grouping variable ('nm1'), grouped by 'A', get the sum, and use setnames with old and new specified

nm1 <- setdiff(names(dt), "A")
setnames(dt[, lapply(.SD, sum), A], nm1, paste0('total', nm1))[]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.