1

I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.

groupList <- list(".(A, B)", "A", "B")

for(i in 1:length(groupList)){
  output <- dt[,list(mean=mean(C),
                     sd=sd(C),
                     min=min(C),
                     median=median(C),
                     max=max(C)),
               by = groupList[i]]

  Here insert code to save each output
}
2
  • Can you make a minimum subset of your data frame, say df, and paste the output of dput(df) into your question? Commented Apr 27, 2018 at 2:02
  • However, if I were you, I would do three separate split-apply-combine's rather than creating a list after after grouping by the two variables. This is easily done with, e.g. dplyr::summarise family of functions. Commented Apr 27, 2018 at 2:17

3 Answers 3

2

I guess aggregate function can solve your problem. Let us say you have a dataframe df contains three columns A,B,C,given as:

df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)

If you want calculate mean of C by factor A, try:

aggregate(formula=C~A,data=df,FUN=mean)

by factor B, try:

aggregate(formula=C~B,data=df,FUN=mean)

by factor A and B, try:

aggregate(formula=C~A+B,data=df,FUN=mean)
Sign up to request clarification or add additional context in comments.

Comments

0

Your groupList can be restructured as a list of character vectors. Then you can either use lapply or the existing for loop with an added eval() to interpret the by= input properly:

set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)

groupList <- list(c("A", "B"), c("A"), c("B"))

lapply(
  groupList,
  function(x) {
    dt[, .(mean=mean(C), sd=sd(C)), by=x]
  }
)

out <- vector("list", 3)
for(i in 1:length(groupList)){
  out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}

str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame':      6 obs. of  4 variables:
#  ..$ A   : int [1:6] 1 1 1 2 2 2
#  ..$ B   : int [1:6] 1 2 3 3 4 5
#  ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
#  ..$ sd  : num [1:6] 0.707 0.707 NA NA 0.707 ...
#  ..- attr(*, ".internal.selfref")=<externalptr> 
# $ :Classes ‘data.table’ and 'data.frame':      2 obs. of  3 variables:
#  ..$ A   : int [1:2] 1 2
#  ..$ mean: num [1:2] 3 8
#  ..$ sd  : num [1:2] 1.58 1.58
#  ..- attr(*, ".internal.selfref")=<externalptr> 
# $ :Classes ‘data.table’ and 'data.frame':      5 obs. of  3 variables:
#  ..$ B   : int [1:5] 1 2 3 4 5
#  ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
#  ..$ sd  : num [1:5] 0.707 0.707 0.707 0.707 0.707

1 Comment

Thank you. Two very elegant solutions. I'll take the one that uses lapply and I'll keep in mind the other one using the for-loop.
0

For demonstration, I used the mtcars data set. Here is one way with the dplyr package.

library(dplyr)

# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")

# group by the variable gear
mtcars %>%
  group_by(gear) %>%
  summarise_at(vars(mpg), describe) 

# group by the variable carb
mtcars %>%
  group_by(carb) %>%
  summarise_at(vars(mpg), describe) 

# group by both gear and carb
mtcars %>%
  group_by(gear, carb) %>%
  summarise_at(vars(mpg), describe) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.