0

I am trying to write a function using aggregate() that will allow me to easily specify one or more variables to list by and their names.

data:

   FCST_VAR OBS_SID FCST_INIT_HOUR       ME
     WIND   00000             12    4.00000
     WIND   11111             12   -0.74948
     WIND   22222             12   -0.97792
     WIND   00000             00   -2.15822
     WIND   11111             00    0.94710
     WIND   22222             00   -2.28489

I can do this for a single variable to group by fairly easily:

aggregate.CNT <- function(input.data, aggregate.by) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(Station_ID = input.data[[OBS_SID]]),          
                mean, na.rm=T)
  }

However, I'm stumped on two things: Firstly, a way to be able to call the function specifying a name for the 'group by' column (instead of Group1), eg in the case of:

aggregate.CNT <- function(input.data, aggregate.by, group.name) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(group.name = input.data[[OBS_SID]]),          
                mean, na.rm=T)
}

But this results in the column name in the output being group.name rather than the desired value of the argument.

Secondly, building on that - if I want to optionally specify more than one variable to sort by - with names. I tried using ... but that doesn't seem to possibly since the additional arguments obviously need to be in the form:

list(arg1 = input.data[[arg2]], arg3 = input.data[[arg4]])

And I don't think there's a way to place extra arguments into a arg3 = input.data[[arg4]] format. So I was wondering if there is a way to use an argument to insert a whole string into the function, eg:

aggregate.CNT <- function(input.data, aggregate.by.list) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(aggregate.by.list),          
                mean, na.rm=T)

aggregate.CNT(data, "Station_ID = data$OBS_SID, Init_Hour = data$FCST_INIT_HOUR")

If this isn't possible, suggestions for alternative methods are also greatly appreciated.

Thanks

Mal

2
  • Can you demonstrate what output you would like? And are you familiar with the plyr package? Depending on what you are wanting to do, I expect you will find an answer there Commented May 17, 2013 at 3:41
  • See G. Grothendieck's answer for the sort of output I would like, however I would ideally like to be able to specify the names of the columns differently to the variable names - so in his example column 'g' and 'b' would be names I define as arguments in the function, to the effect of list(FOO = data[[g]]). Will check out the plyr package though. Commented May 17, 2013 at 4:11

1 Answer 1

1

Try this:

aggregate.CNT <- function(data, by) {
    ag <- aggregate(ME ~., data[c("ME", by)], mean, na.rm = TRUE)
    if (!is.null(names(by))) names(ag) <- c(names(by), "ME")
    ag
}

Here is an example:

> DF <- data.frame(ME = 1:5, g = c(1, 1, 2, 2, 2), b = c(1, 1, 1, 2, 2))
> aggregate.CNT(DF, "g")
  g  ME
1 1 1.5
2 2 4.0
> aggregate.CNT(DF, c("g", "b"))
  g b  ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5
> aggregate.CNT(DF, c(G = "g", B = "b"))
  G B  ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5

ADDED: by vector may be named.

Sign up to request clarification or add additional context in comments.

2 Comments

This indeed solves it partially - Group.1 and Group.2 column names are replaced by the actual variable names I sort by. It would be nice to specify the column names explicitly though, so in your example being able to specify the output column name for 'g' as 'FOO'. Puzzled as to why specifying the variables to aggregate by in this way names the columns the same, whereas in the form list(data[[by]]) they are named Group.1 (, Group.2, etc)
Have added a feature whereby the by vector components may be named and if so it uses those names. See third example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.