1

I have a dataset that looks like this

df

ID size product   x    y 
A   1     abc    0.3   5
B   1     abc    0.8   7
C   1     abc    0.5   2
D   3     def    0.6   1

And I want to aggregate it x with mean and y with sum

so the code for aggregate both by sum is like this

df1<-aggregate(list(x=df$x, y=df$y), by=list(df$size), FUN="sum")

How can I change that code to have a dataset like this one:

df2

size     x     y 
 1      0.53   14
 3      0.6    1

Thanks in advance

1
  • 1
    I get 14 as sum Commented Apr 6, 2021 at 21:29

3 Answers 3

4

One option is collap from collapse

library(collapse)
collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 'x', fsum = 'y'))
#  size         x  y
#1    1 0.5333333 14
#2    3 0.6000000  1

NOTE: we could have a range of columns in the list i.e.

collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 2:3))
#  size         x        y
#1    1 0.5333333 4.666667
#2    3 0.6000000 1.000000

Or using the mtcars data

collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5),
          fsum = c(6, 7), fmedian = 8:9))
#       mpg cyl     disp        hp     drat     wt   qsec vs am
#1 26.66364   4 105.1364  82.63636 4.070909 25.143 210.51  1  1
#2 19.74286   6 183.3143 122.28571 3.585714 21.820 125.84  1  0
#3 15.10000   8 353.1000 209.21429 3.229286 55.989 234.81  0  0

The return can be a list or long_dupl or long or wide

collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5), 
       fsum = c(6, 7), fmedian = 8:9), return = "list")

If we want to use multiple functions for different columns, an option is to loop with Map and Reduce by mergeing the list elements to a single summarised output

Reduce(function(...) merge(..., by = 'size'), 
  Map(function(u, v) aggregate(reformulate('size', response = v), df,
      FUN = match.fun(u)), c('mean', 'sum'), c('x', 'y')))
#  size         x  y
#1    1 0.5333333 14
#2    3 0.6000000  1

data

df <- structure(list(ID = c("A", "B", "C", "D"), size = c(1L, 1L, 1L, 
3L), product = c("abc", "abc", "abc", "def"), x = c(0.3, 0.8, 
0.5, 0.6), y = c(5L, 7L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-4L))
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! It worked! But is there a way to do it with aggregate? I am very familiar with that command.
How can I avoid taking into account NAs? in aggregate one I just use na.rm=T
@JorgeParedes with aggregate you need na.action = NULL along with na.rm = TRUE
@user63230 you don't have to specify as by default it is TRUE fsum(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, ...)
@user63230 it's okay. But your question is pertinent if somebody wants to change those to FALSE
3

I would use the summarise() function from the tidyverse set of packages when dealing with data frames of this nature. This allows you to summarise over a group with multiple different equations. I have demonstrated what I'd expect the solution to look like below with this solution.


df %>%
  group_by(size) %>%
  summarise(
    x = mean(x),
    y = sum (y)
  )

A base r solution would require a different approach.

Comments

1

The function aggregate_multiple_fun in the SSBtools package is a wrapper to aggregate that allows multiple functions and functions of several variables.

In this case a possibility is

library(SSBtools)
aggregate_multiple_fun(df, by = df["size"], vars = c(mean = "x", sum = "y"))

Then, the output variable names will be x_mean and y_sum. For the names to be x and y:

aggregate_multiple_fun(df, by = df["size"], 
                   vars = list(x = list(mean = "x"), 
                               y = list(sum = "y")))

To make use of the parameter na.rm to mean and sum, extra parameters must be set to TRUE:

aggregate_multiple_fun(df, by = df["size"], 
                   vars = c(mean = "x", sum = "y"), 
                   forward_dots = TRUE, 
                   dots2dots = TRUE, 
                   na.rm = TRUE)

Forwarding extra parameters can be a pitfall and therefore this is not done by default. The forward_dots parameter is required, and in the case of mean, dots2dots is also required.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.