0

I have a data.table like so:

dt = data.table(id_1 = c(rep(1:3, 5)), id_2 = sort(rep(c('A', 'B', 'C'), 5)), value_1 = rnorm(15, 1, 1), value_2 = rpois(15, 1))

I would like to create a function which groups the table by some columns specified by the function parameter and performs action (let's say sum) to several other columns specified by another parameter. Finally, i'd like to specify names for the new columns as another function parameter. My problem is: i dont really know how to create names from character vector when i am not using the assignment by reference :=.

The following two approaches achieve exactly what i want to do, i just don't like the way:

Approach one: use the assignment by reference and then choose only one record per group (and forget original columns)

dt_aggregator_1 <- function(data,
                          group_cols = c('id_1', 'id_2'),
                          new_names = c('sum_value_1', 'sum_value_2'),
                          value_cols = c('value_1', 'value_2')){
  data_out = data
  data_out[,(new_names) := lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  data_out[,lapply(.SD, max), by = group_cols, .SDcols = new_names]
}

Approach 2: rename columns after grouping. I assume this is way better approach.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){
  data_out = data[,lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  setnames(data_out, value_cols, new_names)
  data_out[]
}

My question is, if in approach number 2 i can somehow set the names while performing the grouping opperation? So that i would reduce it to one line of code instead of 2:)

1
  • Actually, i start to like the second approach quite a bit, but still want to know the answer how to make it in one line :) Commented Jan 17, 2020 at 11:46

2 Answers 2

1

you can try with dplyr library

library(dplyr)

dt1 <- dt %>% group_by(id_1,id_2) %>%
  summarise(
    sum_value_1 = sum(value_1),
    sum_value_2 = sum(value_2)
  )

dt1
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, thank you for your answer. However, this is not what i am looking for. I want the function to be versatile and i want it to be based on data table
1

You can include setNames in the same line and make this one-liner.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){

  dt[,setNames(lapply(.SD, sum), new_names),by = group_cols, .SDcols = value_cols]

}

3 Comments

Is there no way to do it without setNames?
I couldn't think of any other way for summarising values. If you want to add new columns you could do dt[, (new_names) := lapply(.SD, sum),by = group_cols, .SDcols = value_cols]
Yes. I am looking for it's equivalent for when i do not wish to add columns

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.