1

I would like to make a boxplot with numerical x ("age") fixed into ranges. For example, I want to have each box for x range of "1-5", "5-10", "10-13", "13-20", ">20", and connecting median values of boxes with a line. If possible, I further want to make such boxplot by function of "season" (a series of boxes for each season in a different frame). My data is as below.

season  age value1  value2
wet.summer  14.193  16.786  22.66
fall    3.944   6.432   10.272
fall    9.994   16.111  22.737
fall    3.101   6.507   14.372
winter  2.631   NA  13.889
winter  20.746  22.629  29.27
winter  15.93   21.356  36.454
winter  7.384   7.419   11.851
spring  22.955  25.793  42.038
spring  10.876  16.532  24.188
spring  25.724  27.272  50.447
early.summer    10.825  16.452  23.147

The below is what I used for making another type of boxplot. I tried to modify this to what I want, but failed. Thank you very much for your help!!! :)

p<-ggplot(mydata, aes(factor(age), value1)) 
p + geom_boxplot(aes(fill = factor(season)))+ stat_summary(fun.y=median,  geom="line", aes(group=1))  + 
stat_summary(fun.y=median, geom="point")+ scale_x_discrete(breaks=seq(0,30, by=1))+ theme(legend.position=c(.1, .9))+ theme(legend.text = element_text(colour="black", size = 14, face = "bold"))

1 Answer 1

2

Here is one way. I transformed your data using gather(). I then used cut() to create a group variable, which you described in your question. There are not enough data points in the sample data, but I tried to create a graphic as you mentioned. Hope this is what you are after.

library(dplyr)
library(tidyr)
library(ggplot2)

gather(mydf, whatever, value, -(season:age)) %>%
mutate(group = cut(age, breaks = c(1, 5, 10, 13, 20, Inf),
       labels = c("1-5", "5-10", "10-13", "13-20", "20 +"))) -> mydf2


ggplot(data = mydf2, aes(x = group, y = value, fill = group)) +
geom_boxplot() +
stat_summary(fun.y = median,  geom = "line", aes(group = 1)) +
facet_wrap(~ season)

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

What a beautiful plot! Thanks for your help. But I am not sure what the y value of the box represent; which is value 1 , value2 or combined? I want to draw boxes representing one variable at once. Could you provide more info or modification?
I have been searching "gather" function in r. But, it is not easy to understand "gather(mydf, whatever, value, -(season:age)) %>% ", especially, what "whatever", "value", "%>%" mean. whatever should be "key", value should be a variable..
@user2928318 %>% is an operator which comes from the magrittr package. dplyr employs this operator. As for gather, I would recommend you to see the outcome of gather(mydf, whatever, value, -(season:age)). Seeing outcome will let you understand what happened. Basically, value 1 and value2 stay in one column, which is value. Another column indicates which value it is; whatever is the column including the value information (value 1 or value 2). Hope this gives you a better picture.
Thanks a lot for your help! I understand your code, and can apply for my purpose. :)
@user2928318 I am glad to hear that you can apply the code to your own case now.I was happy to help you out. Good luck. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.