0

I have a CSV dataset (call it data) as follow:

CLASS      CoverageT1      CoverageT2       CoverageT3
Gamma      90              80               75
Gamma      89              72               79
Gamma      92              86               75
Alpha      50              80               67
Alpha      53              78               60
Alpha      58              81               75

I would like to retrieve the unique classes and calculate the average for each coverage column.

What I've done so far is the following:

classes <- subset(data, select = c(CLASS))
unique_classes <- unique(classes)

for(x in unique_classes){
  cove <- subset(data, CLASS == x , select=c(CoverageT1:CoverageT3))
  average <- colMeans(cove)
  print(cove)
}

As a result, I got the following results:

   CoverageT1    CoverageT2    CoverageT3
1  90            80            75
3  92            86            75
4  50            80            67
6  58            81            75

I want to retrieve the coverage values based on each class and then calculate the average. When I print the retrieved coverage values, I get some rows and the other are missing!

Can someone help me solving this issue

Thanks

4 Answers 4

4

Your code isn't working because, amongst other things, you are assigning to average on each iteration and the previous is lost

There are several ways to do what you are trying to do. This would be my approach:

library(dplyr) 

data %>% group_by(CLASS) %>% summarise_all(mean)
Sign up to request clarification or add additional context in comments.

Comments

4

Another option using aggregate

aggregate(data, . ~ CLASS , mean)

2 Comments

Grrr, tried aggregate(CoverageT1 + CoverageT2 + CoverageT3 ~ CLASS, data = xy, FUN = mean) and it didn't work. Didn't think of using the dot notation...
In case of more than one variable on left side you can do cbind. aggregate(cbind(CoverageT1, CoverageT2, CoverageT3) ~ CLASS, data = xy, FUN = mean) In this case when all variables are used dot notation works as well.
2

Taking your idea and wrapping it in by.

xy <- read.table(text = "CLASS      CoverageT1      CoverageT2       CoverageT3
Gamma      90              80               75
                 Gamma      89              72               79
                 Gamma      92              86               75
                 Alpha      50              80               67
                 Alpha      53              78               60
                 Alpha      58              81               75", header = TRUE)


out <- by(data = xy[, -1], INDICES = list(xy$CLASS), FUN = colMeans)
out <- do.call(rbind, out)
out

      CoverageT1 CoverageT2 CoverageT3
Alpha   53.66667   79.66667   67.33333
Gamma   90.33333   79.33333   76.33333

Comments

1

This is how I solved it:

coverage_all <- aggregate(coverage , list(class=data$CLASS), mean)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.