This is an extension of the questions asked here: Aggregate / summarize multiple variables per group (e.g. sum, mean).
- Specifically, if I have multiple variables to
aggregate, is there a way to change theFUNeach variable is aggregated by?
Example:
dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)
> dat
ID Plot Val1 Val2
1 a 1 10 20
2 a 2 20 40
3 a 3 30 60
4 b 1 40 80
5 b 2 50 100
6 b 3 60 120
7 c 1 70 140
8 c 2 80 160
9 c 3 90 180
#Aggregate 2 variables using the *SAME* FUN
aggregate(cbind(Val1, Val2) ~ ID, dat, sum)
ID Val1 Val2
1 a 60 120
2 b 150 300
3 c 240 480
- but notice that both variables are summed.
What if I want to take the sum of Val1 and the mean of Val2??
The best solution I have is:
merge(
aggregate(Val1 ~ ID, dat, sum),
aggregate(Val2 ~ ID, dat, mean),
by = c('ID')
)
- But I'm wondering if their is a cleaner/shorter way to go about doing this...
Can I do this all in Aggregate???
- (I didn't see anything in the
aggregatecode that made it seem like this could work, but I've been wrong before...)
Example #2:
(as requested, usingmtcars)
Reduce(function(df1, df2) merge(df1, df2, by = c('cyl','am'), all = T),
list(
aggregate(hp ~ cyl + am, mtcars, sum, na.rm = T),
aggregate(wt ~ cyl + am, mtcars, min),
aggregate(qsec ~ cyl + am, mtcars, mean, na.rm = T),
aggregate(mpg ~ cyl + am, mtcars, mean, na.rm = T)
)
)
#I'd want a straightforward alternative like:
aggregate(cbind(hp,wt,qsec,mpg) ~ cyl + am, mtcars, list(sum, min, mean, mean), na.rm = T)
# ^(I know this doesn't work)
Note: I would prefer a base R approach, but I already realize dplyr or some other package probably does this "better"
do.call('rbind', by(dat, dat$ID, FUN = function(x) data.frame(sum_v1 = sum(x$Val1), mean_v2 = mean(x$Val2))))do.call("rbind", with(dat, tapply(seq_len(nrow(dat)), ID, FUN = function(i) data.frame(sumV1 = sum(Val1[i]), meanV2 = mean(Val2[i])))))Reduce(ala Simultaneously merge multiple data.frames in a list)dplyrordata.tableshould work fine. Or do you have, say, a vector of column names forFUN1, another forFUN2...? Or something else? Is it a 1-1 mapping or might you want the mean and sum and max of one column, and just the mean of another? Instead of making up data, just usemtcars. You can group by thecylcolumn and have lots of numeric columns to play with.