8

I'm trying to use R's by command to get column means for subsets of a data frame. For example, consider this data frame:

> z = data.frame(labels=c("a","a","b","c","c"),data=matrix(1:20,nrow=5))
> z
  labels data.1 data.2 data.3 data.4
1      a      1      6     11     16
2      a      2      7     12     17
3      b      3      8     13     18
4      c      4      9     14     19
5      c      5     10     15     20

I can use R's by command to get the column means according to the labels column:

> by(z[,2:5],z$labels,colMeans)
z[, 1]: a
data.1 data.2 data.3 data.4
   1.5    6.5   11.5   16.5
------------------------------------------------------------
z[, 1]: b
data.1 data.2 data.3 data.4
     3      8     13     18
------------------------------------------------------------
z[, 1]: c
data.1 data.2 data.3 data.4
   4.5    9.5   14.5   19.5

But how do I coerce the output back to a data frame? as.data.frame doesn't work...

> as.data.frame(by(z[,2:5],z$labels,colMeans))
Error in as.data.frame.default(by(z[, 2:5], z$labels, colMeans)) :
  cannot coerce class '"by"' into a data.frame

3 Answers 3

11

You can use ddply from plyr package

library(plyr)
ddply(z, .(labels), numcolwise(mean))
  labels data.1 data.2 data.3 data.4
1      a    1.5    6.5   11.5   16.5
2      b    3.0    8.0   13.0   18.0
3      c    4.5    9.5   14.5   19.5

Or aggregate from stats

aggregate(z[,-1], by=list(z$labels), mean)
  Group.1 data.1 data.2 data.3 data.4
1       a    1.5    6.5   11.5   16.5
2       b    3.0    8.0   13.0   18.0
3       c    4.5    9.5   14.5   19.5

Or dcast from reshape2 package

library(reshape2)
dcast( melt(z), labels ~ variable, mean)

Using sapply :

 t(sapply(split(z[,-1], z$labels), colMeans))
  data.1 data.2 data.3 data.4
a    1.5    6.5   11.5   16.5
b    3.0    8.0   13.0   18.0
c    4.5    9.5   14.5   19.5
Sign up to request clarification or add additional context in comments.

1 Comment

Great! All do what I was looking for, though aggregate seems like simplest (and the simplest for me to figure out again in the future). Thanks!
9

The output of by is a list so you can use do.call to rbind them and then convert this:

as.data.frame(do.call("rbind",by(z[,2:5],z$labels,colMeans)))
  data.1 data.2 data.3 data.4
a    1.5    6.5   11.5   16.5
b    3.0    8.0   13.0   18.0
c    4.5    9.5   14.5   19.5

Comments

0

Dealing with the by output can be really annoying. I just found a way to withdraw what you want in a format of a data frame and you won't need extra packages.

So, if you do this:

aux <- by(z[,2:5],z$labels,colMeans)

You can then transform it in a data frame by doing this:

  aux_df <- as.data.frame(t(aux[seq(nrow(aux)),seq(ncol(aux))]))

I'm just getting all the rows and columns from aux, transposing it and using as.data.frame.

I hope that helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.