R aggregate dynamically added columns with a separate function for each of them

Question

I have a dataframe like this:

id  v    t1   t2  t3    t4   date1        list1

1   1.0  1.4   2   0.45   3    2020-09-03   val1
1   1.0  1.6   3   0.55  3.7  2020-09-05   val2

How can I group by id, v and aggregate the columns t1, t2, t3, t4, date1, list1 by applying a different aggregate function to each one of them. More specifically

t1 -> mean
t2 -> max
t3 -> mean
t4 -> max
date -> max
list1 -> join as in python's ','.join

So after aggregate the frame looks like:

id  v    t1   t2  t3    t4   date1        list1

1   1.0  1.5   3   0.5   3.7  2020-09-05   val1, val2

Also one more thing is, these columns could be dynamically added based on a user selection in an R shiny framework, which means all these columns that I intend to aggregate are in the dataframe but some of them may not need to be aggregated, for example user could select only t1, date1 and not the remaining. So my aggregate parameters depend on the selected columns and I do have the column names available from user selection. So probably it makes sense if I ask that how can I build a dynamic aggregate query.

In Python, I could build a dict like the one above dynamically based on user selected columns and use something like pd.agg(**dict)

How can I do this in R? I tried to look at dplyr::summarise and data.table but then I cannot seem to aggregate all of them at once.

A quick search brought up smartAgg, which I have not used myself. — markus
– markus, Commented Nov 15, 2020 at 21:27

akrun · Accepted Answer · 2020-11-15 22:29:49Z

2

We can use across to apply functions on blocks of columns

library(dplyr)
df1 %>% 
   group_by(id, v) %>% 
   summarise(across(c(t1, t3), mean),
             across(c(t2, t4, date1), max), 
             list1 = toString(list1), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <chr>      <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

If the functions, column names are all user input

nm1 <- c("t1", "t3")
nm2 <- c("t2", "t4", "date1")
nm3 <- c("list1")

f1 <- "mean"
f2 <- "max"
f3 <- "toString"

df1 %>%
    group_by(id, v) %>%
    summarise(across(all_of(nm1), ~ match.fun(f1)(.)),
              across(all_of(nm2), ~ match.fun(f2)(.)),
              !! nm3 := match.fun(f3)(!! rlang::sym(nm3)), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <date>     <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

It can be also passed as an expression and evaluated

expr1 <- glue::glue('across(c({toString(nm1)}), {f1});',
              'across(c({toString(nm2)}),  {f2});',
          'across(c({toString(nm3)}),  {f3})')
df1 %>% 
     group_by(id, v) %>%
     summarise(!!! rlang::parse_exprs(expr1), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <date>     <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

data

df1 <- structure(list(id = c(1L, 1L), v = c(1, 1), t1 = c(1.4, 1.6), 
    t2 = 2:3, t3 = c(0.45, 0.55), t4 = c(3, 3.7), date1 = structure(c(18508, 
    18510), class = "Date"), list1 = c("val1", "val2")), row.names = c(NA, 
-2L), class = "data.frame")

edited Nov 15, 2020 at 22:29

answered Nov 15, 2020 at 21:25

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

SomeDude Over a year ago

Thanks. How can I build the parameters for across like c(t1, t3) and mean dynamically because those columns are added by user selection and I don't know them ahead.

akrun Over a year ago

@SomeDude I guess there is an option to get the user selected column name? I didn't find that info in your post

SomeDude Over a year ago

yes I do have column names from user selection. But can I do across(c("user_selected_col1", "user_selected_col2"), "mean") ? Does across accept such strings for column names and aggregate functions?

akrun Over a year ago

@SomeDude Yes, it does. It can accept either unquoted or quoted or numeric index

SomeDude Over a year ago

I tried this : df1 %>% + group_by(id, v) %>% + summarise(across(c(t1, t3), "mean"), + across(c(t2, t4, date1), "max"), + list1 = toString(list1), .groups = 'drop') , but it gave me the error : Error: Problem with summarise() input ..1. x Problem with across() input .fns. i Input .fns must be NULL, a function, a formula, or a list of functions/formulas. i Input ..1 is across(c(t1, t3), "mean"). i The error occurred in group 1: id = 1, v = 1. Not to mention I haven't provided cols as "t1", "t2" etc.

|

Collectives™ on Stack Overflow

R aggregate dynamically added columns with a separate function for each of them

1 Answer 1

data

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

data

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related