R data.table: New data table with named columns and drop the rest

Question

I want to do something very simple but so far I have failed to do it in one command. I want to create a new data table by applying a function to some columns of an existing one while giving them a name and droppinh the rest. Let's see a minimal example:

library(data.table)
dt = data.table(A = c('a', 'a', 'a', 'b', 'b'),
                B = c(1  , 2  , 3  , 4  , 5  ),
                C = c(10 , 20 , 30 , 40 , 50))
dt
A   B   C
a   1   10
a   2   20
a   3   30
b   4   40
b   5   50

For a single column, we can do:

dt1 = dt[, .(totalB = sum(B)), by=A]
dt1
A   totalB
a   6
b   9

For more than 1 columns, we can do:

dt2 = dt[, .(totalB = sum(B), totalC = sum(C)), by=A]
dt2
A   totalB   totalC
a   6        60
b   9        90

But if the columns are many that's not the best practice. So I guess we should go with lapply like that:

dt3 = dt[, lapply(.SD, sum), by = A]
dt3
A   B   C
a   6   60
b   9   90

That creates the table but without the names. So we can add them:

names = c("totalA", "totalB")
dt4 = dt[, c("totalA", "totalB") := lapply(.SD, sum), by = A ]
dt4
A   B   C   totalA  totalB
a   1   10  6   60
a   2   20  6   60
a   3   30  6   60
b   4   40  9   90
b   5   50  9   90

But now the columns remained. How can we prevent that? Also note that in my actual problem I use a subset of the columns, via SDcols, which I didn't include here for simplicity.

EDIT: My desired output is the same as dt2 but I don't want to write down all columns.

Why not just selecting the relevant columns in your dt4 creation, you can add an additional [] and select the two columns, you're interested in. — hannes101
– hannes101, Commented Jan 16, 2021 at 18:57
@hannes101 I am actually intested in plenty of columns though. So not that simple. Plus, ituitively, there should be a way to do what I want given that it's so close. — cgss
– cgss, Commented Jan 16, 2021 at 19:03

ThomasIsCoding · Accepted Answer · 2021-01-16 19:04:31Z

0

Do you mean something like below?

dt[, setNames(lapply(.SD, sum), paste0("total", names(.SD))), A]

Output

   A totalB totalC
1: a      6     60
2: b      9     90

edited Jan 16, 2021 at 19:04

answered Jan 16, 2021 at 18:44

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

cgss Over a year ago

Sorry but no. I think that's actually the same as my dt4. Also, I will not be using totalB etc so I definitely want the names vector.

ThomasIsCoding Over a year ago

@cgss What's your desired output? Please show an example

cgss Over a year ago

My desired output is the same as dt2 but if I have like 10 columns writing `.(name1 = sum(col1), ... , name10=sum(col10)) looks kinda bad. I will edit the question to add this.

cgss Over a year ago

Your update works in thw mwe. It also works with my names variable, instead of the paste function. Strangely enough, it results in the following error in my actual project: setnames(lapply(.SD, sum), names): x is not a data.table or data.frame Clearly, it is a dataframe.

akrun · Accepted Answer · 2021-01-16 19:05:59Z

0

Another option is setnames. Create a vector of column names that we want to apply the function other than the grouping variable ('nm1'), grouped by 'A', get the sum, and use setnames with old and new specified

nm1 <- setdiff(names(dt), "A")
setnames(dt[, lapply(.SD, sum), A], nm1, paste0('total', nm1))[]

answered Jan 16, 2021 at 19:05

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

R data.table: New data table with named columns and drop the rest

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related