3
df1=data.frame(c(2,1,2),c(1,2,3,4,5,6),seq(141,170)) #create data.frame
names(df1) = c("gender","age","height") #column names
df1$gender <- factor(df1$gender,
levels=c(1,2),
labels=c("female","male")) #gives levels and labels to gender
df1$age <- factor(df1$age,
levels=c(1,2,3,4,5,6),
labels=c("16-24","25-34","35-44","45-54","55-64","65+")) # gives levels and labels to age groups

I am looking to produce a summary of the height values subsetted by gender and then age.

Using the subset and by functions as provides the output I want:

females<-subset(df1,df1$gender==1) #subsetting by gender
males<-subset(df1,df1$gender==2)

foutput=by(females$height,females$age,summary) #producing summary subsetted by age
moutput=by(males$height,males$age,summary)

However I require it to be in a data.frame so that I can export these results alongside frequency tables using XLconnect.

Is there an way to convert the output to a data.frame or an elegant alternative, possibly using plyr?

2 Answers 2

4

Here's one approach using plyr:

> ddply(df1, c("gender", "age"), function(x) summary(x$height))
  gender   age Min. 1st Qu. Median Mean 3rd Qu. Max.
1 female 25-34  142     148    154  154     160  166
2 female 55-64  145     151    157  157     163  169
3   male 16-24  141     147    153  153     159  165
4   male 35-44  143     149    155  155     161  167
5   male 45-54  144     150    156  156     162  168
6   male   65+  146     152    158  158     164  170
Sign up to request clarification or add additional context in comments.

5 Comments

That looks ideal. I thought plyr might be the solution!
@BuckyO - I find it hard to beat plyr for ease of use and consistency between different tasks. You may run into performance issues with large data and/or many groups, but for most "mortal" tasks - I find it quite nice. Good luck!
Thanks for that. I'll keep in mind the performance issues you have mentioned. Approved this answer as I have tried it with more subsets and other functions and it has worked.
could you explain function (x) to me. I have looked again at where I used this function with count and another subset, in this case it adds a column called x.
@BuckyO - function(x) is an "anonymous function. Each "chunk" of df1 is broken up by the combinations of age and gender and passed to the function(x), we're then able to reference that chunk with x in the call to summary. Here's a bit more of a background on anynomous functions, and some specific insight to R: en.wikipedia.org/wiki/Anonymous_function#R
2

The output from by is really a list, but it looks different because of the print.by method.

So you can use do.call to rbind the elements into a matrix and then call data.frame on that:

data.frame(do.call(rbind,by(mtcars$hp,mtcars$cyl,summary)),check.names=FALSE)
  Min. 1st Qu. Median   Mean 3rd Qu. Max.
4   52    65.5   91.0  82.64    96.0  113
6  105   110.0  110.0 122.30   123.0  175
8  150   176.2  192.5 209.20   241.2  335

Note the use of the check.names argument to avoid column names sanitisation.

3 Comments

Thanks for your answer and especially about print.by. The minimum values here are lower than the minimum height value, is this an example from another data set?
@BuckyO Yes, this is from the built-in mtcars data set. I'm using IE7 and have difficulty copying multiline data examples on here.
I've approved Chase's answer but I'll try yours was also very useful. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.