2

I have a dataframe like this:

id  v    t1   t2  t3    t4   date1        list1

1   1.0  1.4   2   0.45   3    2020-09-03   val1
1   1.0  1.6   3   0.55  3.7  2020-09-05   val2

How can I group by id, v and aggregate the columns t1, t2, t3, t4, date1, list1 by applying a different aggregate function to each one of them. More specifically

t1 -> mean
t2 -> max
t3 -> mean
t4 -> max
date -> max
list1 -> join as in python's ','.join

So after aggregate the frame looks like:

id  v    t1   t2  t3    t4   date1        list1

1   1.0  1.5   3   0.5   3.7  2020-09-05   val1, val2

Also one more thing is, these columns could be dynamically added based on a user selection in an R shiny framework, which means all these columns that I intend to aggregate are in the dataframe but some of them may not need to be aggregated, for example user could select only t1, date1 and not the remaining. So my aggregate parameters depend on the selected columns and I do have the column names available from user selection. So probably it makes sense if I ask that how can I build a dynamic aggregate query.

In Python, I could build a dict like the one above dynamically based on user selected columns and use something like pd.agg(**dict)

How can I do this in R? I tried to look at dplyr::summarise and data.table but then I cannot seem to aggregate all of them at once.

1
  • 1
    A quick search brought up smartAgg, which I have not used myself. Commented Nov 15, 2020 at 21:27

1 Answer 1

2

We can use across to apply functions on blocks of columns

library(dplyr)
df1 %>% 
   group_by(id, v) %>% 
   summarise(across(c(t1, t3), mean),
             across(c(t2, t4, date1), max), 
             list1 = toString(list1), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <chr>      <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

If the functions, column names are all user input

nm1 <- c("t1", "t3")
nm2 <- c("t2", "t4", "date1")
nm3 <- c("list1")

f1 <- "mean"
f2 <- "max"
f3 <- "toString"

df1 %>%
    group_by(id, v) %>%
    summarise(across(all_of(nm1), ~ match.fun(f1)(.)),
              across(all_of(nm2), ~ match.fun(f2)(.)),
              !! nm3 := match.fun(f3)(!! rlang::sym(nm3)), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <date>     <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

It can be also passed as an expression and evaluated

expr1 <- glue::glue('across(c({toString(nm1)}), {f1});',
              'across(c({toString(nm2)}),  {f2});',
          'across(c({toString(nm3)}),  {f3})')
df1 %>% 
     group_by(id, v) %>%
     summarise(!!! rlang::parse_exprs(expr1), .groups = 'drop')

-output

# A tibble: 1 x 8
#     id     v    t1    t3    t2    t4 date1      list1     
#  <int> <dbl> <dbl> <dbl> <int> <dbl> <date>     <chr>     
#1     1     1   1.5   0.5     3   3.7 2020-09-05 val1, val2

data

df1 <- structure(list(id = c(1L, 1L), v = c(1, 1), t1 = c(1.4, 1.6), 
    t2 = 2:3, t3 = c(0.45, 0.55), t4 = c(3, 3.7), date1 = structure(c(18508, 
    18510), class = "Date"), list1 = c("val1", "val2")), row.names = c(NA, 
-2L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

11 Comments

Thanks. How can I build the parameters for across like c(t1, t3) and mean dynamically because those columns are added by user selection and I don't know them ahead.
@SomeDude I guess there is an option to get the user selected column name? I didn't find that info in your post
yes I do have column names from user selection. But can I do across(c("user_selected_col1", "user_selected_col2"), "mean") ? Does across accept such strings for column names and aggregate functions?
@SomeDude Yes, it does. It can accept either unquoted or quoted or numeric index
I tried this : df1 %>% + group_by(id, v) %>% + summarise(across(c(t1, t3), "mean"), + across(c(t2, t4, date1), "max"), + list1 = toString(list1), .groups = 'drop') , but it gave me the error : Error: Problem with summarise() input ..1. x Problem with across() input .fns. i Input .fns must be NULL, a function, a formula, or a list of functions/formulas. i Input ..1 is across(c(t1, t3), "mean"). i The error occurred in group 1: id = 1, v = 1. Not to mention I haven't provided cols as "t1", "t2" etc.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.