3

This is an extension of the questions asked here: Aggregate / summarize multiple variables per group (e.g. sum, mean).

  • Specifically, if I have multiple variables to aggregate, is there a way to change the FUN each variable is aggregated by?

Example:

dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)

> dat
  ID Plot Val1 Val2
1  a    1   10   20
2  a    2   20   40
3  a    3   30   60
4  b    1   40   80
5  b    2   50  100
6  b    3   60  120
7  c    1   70  140
8  c    2   80  160
9  c    3   90  180


#Aggregate 2 variables using the *SAME* FUN
  aggregate(cbind(Val1, Val2) ~ ID, dat, sum)

  ID Val1 Val2
1  a   60  120
2  b  150  300
3  c  240  480
  • but notice that both variables are summed.

What if I want to take the sum of Val1 and the mean of Val2??

The best solution I have is:

merge(
  aggregate(Val1 ~ ID, dat, sum),
  aggregate(Val2 ~ ID, dat, mean),
  by = c('ID')
)
  • But I'm wondering if their is a cleaner/shorter way to go about doing this...

Can I do this all in Aggregate???

  • (I didn't see anything in the aggregate code that made it seem like this could work, but I've been wrong before...)

Example #2:

(as requested, using mtcars)

Reduce(function(df1, df2) merge(df1, df2, by = c('cyl','am'), all = T),
    list(
    aggregate(hp ~ cyl + am, mtcars, sum, na.rm = T),
    aggregate(wt ~ cyl + am, mtcars, min), 
    aggregate(qsec ~ cyl + am, mtcars, mean, na.rm = T),
    aggregate(mpg ~ cyl + am, mtcars, mean, na.rm = T)
  )
)

#I'd want a straightforward alternative like:
  aggregate(cbind(hp,wt,qsec,mpg) ~ cyl + am, mtcars, list(sum, min, mean, mean), na.rm = T)

  # ^(I know this doesn't work)

Note: I would prefer a base R approach, but I already realize dplyr or some other package probably does this "better"

7
  • I'm sure there's a cleaner way + you'll still have to do some manipulating: do.call('rbind', by(dat, dat$ID, FUN = function(x) data.frame(sum_v1 = sum(x$Val1), mean_v2 = mean(x$Val2)))) Commented Apr 30, 2018 at 19:13
  • Note: in reality, my data contains dozens of variables I want to aggregate with at least 3 different functions, so I'm looking for a solution that scales up effectively :). Commented Apr 30, 2018 at 19:17
  • Re "dozens of variables", so you want to pass them programmatically rather than explicitly typing their names? Anyway, one clumsy variant of bouncy's approach: do.call("rbind", with(dat, tapply(seq_len(nrow(dat)), ID, FUN = function(i) data.frame(sumV1 = sum(Val1[i]), meanV2 = mean(Val2[i]))))) Commented Apr 30, 2018 at 19:20
  • I can scale up to more than 2 data.frames using Reduce (ala Simultaneously merge multiple data.frames in a list) Commented Apr 30, 2018 at 19:39
  • "Dozens of variables... at least 3 different functions". You should make your example more reflective of your use case. Do you want to hand-code each variable/function pair? If so, standard dplyr or data.table should work fine. Or do you have, say, a vector of column names for FUN1, another for FUN2...? Or something else? Is it a 1-1 mapping or might you want the mean and sum and max of one column, and just the mean of another? Instead of making up data, just use mtcars. You can group by the cyl column and have lots of numeric columns to play with. Commented Apr 30, 2018 at 19:47

2 Answers 2

3

Consider pairwise mapping of columns and functions and then run Map to build a list of aggregated dataframes since aggregate allows string values of function names. Then run a Reduce to merge all dataframe elements together.

cols <- names(dat)[grep("Val", names(dat))]
fcts <- c("mean", "sum")

df_list <- Map(function(c, f) aggregate(.~ID, dat[c("ID", c)], FUN=f), cols, fcts)

final_df <- Reduce(function(x,y) merge(x, y, by="ID"), df_list)

final_df
#   ID Val1 Val2
# 1  a   20  120
# 2  b   50  300
# 3  c   80  480

Be sure columns and functions vectors are same length, possibly needing to repeat functions.

And to demonstrate with mtcars:

cols <- c("hp", "wt", "qsec", "mpg")
fcts <- c("sum", "min", "mean", "mean")

df_list <- Map(function(c, f) aggregate(.~cyl+am, mtcars[c("cyl", "am", c)], FUN=f), cols, fcts)

Reduce(function(x,y) merge(x,y, by=c("cyl", "am")), df_list)

#   cyl am   hp    wt     qsec      mpg
# 1   4  0  254 2.465 20.97000 22.90000
# 2   4  1  655 1.513 18.45000 28.07500
# 3   6  0  461 3.215 19.21500 19.12500
# 4   6  1  395 2.620 16.32667 20.56667
# 5   8  0 2330 3.435 17.14250 15.05000
# 6   8  1  599 3.170 14.55000 15.40000
Sign up to request clarification or add additional context in comments.

Comments

1

You can use summarise from the dplyr package

library(dplyr)

dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)
dat

#>   ID Plot Val1 Val2
#> 1  a    1   10   20
#> 2  a    2   20   40
#> 3  a    3   30   60
#> 4  b    1   40   80
#> 5  b    2   50  100
#> 6  b    3   60  120
#> 7  c    1   70  140
#> 8  c    2   80  160
#> 9  c    3   90  180

dat %>% 
  group_by(ID) %>% 
  summarise(sum_val1 = sum(Val1, na.rm = TRUE),
            mean_val2 = mean(Val2, na.rm = TRUE)) %>%
  ungroup()

#> # A tibble: 3 x 3
#>   ID    sum_val1 mean_val2
#>   <fct>    <dbl>     <dbl>
#> 1 a           60        40
#> 2 b          150       100
#> 3 c          240       160

Created on 2018-04-30 by the reprex package (v0.2.0).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.