I am trying to write a function using aggregate() that will allow me to easily specify one or more variables to list by and their names.
data:
FCST_VAR OBS_SID FCST_INIT_HOUR ME
WIND 00000 12 4.00000
WIND 11111 12 -0.74948
WIND 22222 12 -0.97792
WIND 00000 00 -2.15822
WIND 11111 00 0.94710
WIND 22222 00 -2.28489
I can do this for a single variable to group by fairly easily:
aggregate.CNT <- function(input.data, aggregate.by) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(Station_ID = input.data[[OBS_SID]]),
mean, na.rm=T)
}
However, I'm stumped on two things: Firstly, a way to be able to call the function specifying a name for the 'group by' column (instead of Group1), eg in the case of:
aggregate.CNT <- function(input.data, aggregate.by, group.name) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(group.name = input.data[[OBS_SID]]),
mean, na.rm=T)
}
But this results in the column name in the output being group.name rather than the desired value of the argument.
Secondly, building on that - if I want to optionally specify more than one variable to sort by - with names. I tried using ... but that doesn't seem to possibly since the additional arguments obviously need to be in the form:
list(arg1 = input.data[[arg2]], arg3 = input.data[[arg4]])
And I don't think there's a way to place extra arguments into a arg3 = input.data[[arg4]] format.
So I was wondering if there is a way to use an argument to insert a whole string into the function, eg:
aggregate.CNT <- function(input.data, aggregate.by.list) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(aggregate.by.list),
mean, na.rm=T)
aggregate.CNT(data, "Station_ID = data$OBS_SID, Init_Hour = data$FCST_INIT_HOUR")
If this isn't possible, suggestions for alternative methods are also greatly appreciated.
Thanks
Mal
plyrpackage? Depending on what you are wanting to do, I expect you will find an answer therelist(FOO = data[[g]]). Will check out theplyrpackage though.