converting output of R's "by" command to data frame

Question

I'm trying to use R's by command to get column means for subsets of a data frame. For example, consider this data frame:

> z = data.frame(labels=c("a","a","b","c","c"),data=matrix(1:20,nrow=5))
> z
  labels data.1 data.2 data.3 data.4
1      a      1      6     11     16
2      a      2      7     12     17
3      b      3      8     13     18
4      c      4      9     14     19
5      c      5     10     15     20

I can use R's by command to get the column means according to the labels column:

> by(z[,2:5],z$labels,colMeans)
z[, 1]: a
data.1 data.2 data.3 data.4
   1.5    6.5   11.5   16.5
------------------------------------------------------------
z[, 1]: b
data.1 data.2 data.3 data.4
     3      8     13     18
------------------------------------------------------------
z[, 1]: c
data.1 data.2 data.3 data.4
   4.5    9.5   14.5   19.5

But how do I coerce the output back to a data frame? as.data.frame doesn't work...

> as.data.frame(by(z[,2:5],z$labels,colMeans))
Error in as.data.frame.default(by(z[, 2:5], z$labels, colMeans)) :
  cannot coerce class '"by"' into a data.frame

Jilber Urbina · Accepted Answer · 2012-09-14 10:49:02Z

11

You can use ddply from plyr package

library(plyr)
ddply(z, .(labels), numcolwise(mean))
  labels data.1 data.2 data.3 data.4
1      a    1.5    6.5   11.5   16.5
2      b    3.0    8.0   13.0   18.0
3      c    4.5    9.5   14.5   19.5

Or aggregate from stats

aggregate(z[,-1], by=list(z$labels), mean)
  Group.1 data.1 data.2 data.3 data.4
1       a    1.5    6.5   11.5   16.5
2       b    3.0    8.0   13.0   18.0
3       c    4.5    9.5   14.5   19.5

Or dcast from reshape2 package

library(reshape2)
dcast( melt(z), labels ~ variable, mean)

Using sapply :

 t(sapply(split(z[,-1], z$labels), colMeans))
  data.1 data.2 data.3 data.4
a    1.5    6.5   11.5   16.5
b    3.0    8.0   13.0   18.0
c    4.5    9.5   14.5   19.5

edited Sep 14, 2012 at 10:49

answered Sep 12, 2012 at 13:31

Jilber Urbina

61.4k10 gold badges116 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Andrew Over a year ago

Great! All do what I was looking for, though aggregate seems like simplest (and the simplest for me to figure out again in the future). Thanks!

James · Accepted Answer · 2012-09-12 13:34:22Z

9

The output of by is a list so you can use do.call to rbind them and then convert this:

as.data.frame(do.call("rbind",by(z[,2:5],z$labels,colMeans)))
  data.1 data.2 data.3 data.4
a    1.5    6.5   11.5   16.5
b    3.0    8.0   13.0   18.0
c    4.5    9.5   14.5   19.5

answered Sep 12, 2012 at 13:34

James

67.1k14 gold badges158 silver badges200 bronze badges

Comments

Diego Rodrigues · Accepted Answer · 2016-09-14 10:34:17Z

0

Dealing with the by output can be really annoying. I just found a way to withdraw what you want in a format of a data frame and you won't need extra packages.

So, if you do this:

aux <- by(z[,2:5],z$labels,colMeans)

You can then transform it in a data frame by doing this:

  aux_df <- as.data.frame(t(aux[seq(nrow(aux)),seq(ncol(aux))]))

I'm just getting all the rows and columns from aux, transposing it and using as.data.frame.

I hope that helps.

answered Sep 14, 2016 at 10:34

Diego Rodrigues

8644 silver badges13 bronze badges

Collectives™ on Stack Overflow

converting output of R's "by" command to data frame

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related