Here is what I want to Do: I have a dataframe df defined as:
col1 <- c("a","a","a","a","a","a","b","b","b","b","b","b")
col2 <- c("z","z","x","x","z","x", "z","z","x","x","z","x")
col3 <- c(1,2,3,4,5,6,7,8,9,10,11,12)
df <- data.frame(col1,col2,col3)
and a function pred that calculates the mean defined as :
pred <- function(subset_df){return(mean(subset_df$col3))}
I want a data frame through a by function in a below format:
col1 col2 col3_mean
a x 4.33
a z 2.66
b x 10.33
b z 8.66
I am currently using a by() function to partition this data into its strata and apply a pred() function that calculates a mean
by_keys <- c("col1","col2")
data_sub <- by(df, data_sub[,by_keys], pred)
data_sub <- do.call(rbind, data_sub)
I am getting an error here saying the "Error in do.call(rbind, data_sub) : second argument must be a list"
I tried a solution from a similar tread but I dont get col1 and col2 as in desired format
as.data.frame(vapply(data_sub,unlist,unlist(data_sub[[1]])))
Would appreciate any help on this.
aggregate(col3~.,df,mean)