8

I have grouped data in R using the aggregate method.

Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean)

This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'.

Now, instead of taking the average of all values of 'a', I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' .

Sample data set

a    b    c
10   G    3 
20   G    3 
22   G    3
10   G    3 
15   G    3
25   G    3
30   G    3

After the above Aggregate function, it will give me:

Group.1    Group.2    x
  G          3       18.85

But I want to take just the maximum 5 values of 'a' for the average

Group.1    Group.2    x
  G          3       22.40

I am not able to accommodate the below maximum function that I am using in the Aggregate function

index <- order(vector, decreasing = T)[1:5]
vector(index)

Can anyone please throw some light on if this is possible?

1 Answer 1

7

You can order the data, get the top 5 entries (using head) and then apply the mean:

aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5)))
#  Group.1 Group.2    x
#1       G       3 22.4

If you want to do this with a custom function, I would do it like this:

myfunc <- function(vec, n){
  mean(head(vec[order(-vec)], n))
}

aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5))
#  Group.1 Group.2    x
#1       G       3 22.4

I actually prefer using the formula style in aggregate which would look like this (I also use with() to be able to refer to the column names directly without using x$ each time):

with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5)))
#  b c    a
#1 G 3 22.4

In this function, the parameter z is passed each a-vector based on groups of b and c. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.

Sign up to request clarification or add additional context in comments.

7 Comments

the 'x' used in function and head is which 'x'?
Oh, that wasn't a good choice for a name. Try to replace the last part of the function with ..function(z) myfunc(z, 5)) does that work? I'm currently not at my computer.
So the first argument in the function would be the vector which in my case would be 'x$a' right ? If yes , its is showing an error that argument 1 is not a vector . Thanks
Also I think the function will return integers where as I also want numbers to be up to 3 decimal places , and thats the reason I was using the index thing at start
@user3812709, see my update. The function is not restricted to integers - and it doesn't return an integer for the sample data, as you can see (it's 22.4).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.