Aggregate multiple variables using multiple different FUN in R

Question

This is an extension of the questions asked here: Aggregate / summarize multiple variables per group (e.g. sum, mean).

Specifically, if I have multiple variables to aggregate, is there a way to change the FUN each variable is aggregated by?

Example:

dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)

> dat
  ID Plot Val1 Val2
1  a    1   10   20
2  a    2   20   40
3  a    3   30   60
4  b    1   40   80
5  b    2   50  100
6  b    3   60  120
7  c    1   70  140
8  c    2   80  160
9  c    3   90  180


#Aggregate 2 variables using the *SAME* FUN
  aggregate(cbind(Val1, Val2) ~ ID, dat, sum)

  ID Val1 Val2
1  a   60  120
2  b  150  300
3  c  240  480

but notice that both variables are summed.

What if I want to take the sum of Val1 and the mean of Val2??

The best solution I have is:

merge(
  aggregate(Val1 ~ ID, dat, sum),
  aggregate(Val2 ~ ID, dat, mean),
  by = c('ID')
)

But I'm wondering if their is a cleaner/shorter way to go about doing this...

Can I do this all in Aggregate???

(I didn't see anything in the aggregate code that made it seem like this could work, but I've been wrong before...)

Example #2:

(as requested, using mtcars)

Reduce(function(df1, df2) merge(df1, df2, by = c('cyl','am'), all = T),
    list(
    aggregate(hp ~ cyl + am, mtcars, sum, na.rm = T),
    aggregate(wt ~ cyl + am, mtcars, min), 
    aggregate(qsec ~ cyl + am, mtcars, mean, na.rm = T),
    aggregate(mpg ~ cyl + am, mtcars, mean, na.rm = T)
  )
)

#I'd want a straightforward alternative like:
  aggregate(cbind(hp,wt,qsec,mpg) ~ cyl + am, mtcars, list(sum, min, mean, mean), na.rm = T)

  # ^(I know this doesn't work)

^{Note: I would prefer a base R approach, but I already realize dplyr or some other package probably does this "better"}

I'm sure there's a cleaner way + you'll still have to do some manipulating: do.call('rbind', by(dat, dat$ID, FUN = function(x) data.frame(sum_v1 = sum(x$Val1), mean_v2 = mean(x$Val2)))) — bouncyball
– bouncyball, Commented Apr 30, 2018 at 19:13
Note: in reality, my data contains dozens of variables I want to aggregate with at least 3 different functions, so I'm looking for a solution that scales up effectively :). — theforestecologist
– theforestecologist, Commented Apr 30, 2018 at 19:17
Re "dozens of variables", so you want to pass them programmatically rather than explicitly typing their names? Anyway, one clumsy variant of bouncy's approach: do.call("rbind", with(dat, tapply(seq_len(nrow(dat)), ID, FUN = function(i) data.frame(sumV1 = sum(Val1[i]), meanV2 = mean(Val2[i]))))) — Frank
– Frank, Commented Apr 30, 2018 at 19:20
I can scale up to more than 2 data.frames using Reduce (ala Simultaneously merge multiple data.frames in a list) — theforestecologist
– theforestecologist, Commented Apr 30, 2018 at 19:39
"Dozens of variables... at least 3 different functions". You should make your example more reflective of your use case. Do you want to hand-code each variable/function pair? If so, standard dplyr or data.table should work fine. Or do you have, say, a vector of column names for FUN1, another for FUN2...? Or something else? Is it a 1-1 mapping or might you want the mean and sum and max of one column, and just the mean of another? Instead of making up data, just use mtcars. You can group by the cyl column and have lots of numeric columns to play with. — Gregor Thomas
– Gregor Thomas, Commented Apr 30, 2018 at 19:47

Parfait · Accepted Answer · 2018-04-30 20:15:51Z

Consider pairwise mapping of columns and functions and then run Map to build a list of aggregated dataframes since aggregate allows string values of function names. Then run a Reduce to merge all dataframe elements together.

cols <- names(dat)[grep("Val", names(dat))]
fcts <- c("mean", "sum")

df_list <- Map(function(c, f) aggregate(.~ID, dat[c("ID", c)], FUN=f), cols, fcts)

final_df <- Reduce(function(x,y) merge(x, y, by="ID"), df_list)

final_df
#   ID Val1 Val2
# 1  a   20  120
# 2  b   50  300
# 3  c   80  480

Be sure columns and functions vectors are same length, possibly needing to repeat functions.

And to demonstrate with mtcars:

cols <- c("hp", "wt", "qsec", "mpg")
fcts <- c("sum", "min", "mean", "mean")

df_list <- Map(function(c, f) aggregate(.~cyl+am, mtcars[c("cyl", "am", c)], FUN=f), cols, fcts)

Reduce(function(x,y) merge(x,y, by=c("cyl", "am")), df_list)

#   cyl am   hp    wt     qsec      mpg
# 1   4  0  254 2.465 20.97000 22.90000
# 2   4  1  655 1.513 18.45000 28.07500
# 3   6  0  461 3.215 19.21500 19.12500
# 4   6  1  395 2.620 16.32667 20.56667
# 5   8  0 2330 3.435 17.14250 15.05000
# 6   8  1  599 3.170 14.55000 15.40000

Tung · Accepted Answer · 2018-04-30 19:16:38Z

You can use summarise from the dplyr package

library(dplyr)

dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)
dat

#>   ID Plot Val1 Val2
#> 1  a    1   10   20
#> 2  a    2   20   40
#> 3  a    3   30   60
#> 4  b    1   40   80
#> 5  b    2   50  100
#> 6  b    3   60  120
#> 7  c    1   70  140
#> 8  c    2   80  160
#> 9  c    3   90  180

dat %>% 
  group_by(ID) %>% 
  summarise(sum_val1 = sum(Val1, na.rm = TRUE),
            mean_val2 = mean(Val2, na.rm = TRUE)) %>%
  ungroup()

#> # A tibble: 3 x 3
#>   ID    sum_val1 mean_val2
#>   <fct>    <dbl>     <dbl>
#> 1 a           60        40
#> 2 b          150       100
#> 3 c          240       160

Created on 2018-04-30 by the reprex package (v0.2.0).

Collectives™ on Stack Overflow

Aggregate multiple variables using multiple different FUN in R

Example:

Example #2:

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Example:

Example #2:

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related