0

I need to create some box plots showing the abundance of some bacterial taxa in different samples. My data looks like:

  my.data <- "Taxon 06.TO.VG    21.TO.V 02.TO.VG    41.TO.VG    30.TO.V 04.BA.V 34.TO.VG    01.BA.V 28.TO.VG    18.TO.O 44.TO.V 08.BA.O 07.BA.O 06.BA.V 11.TO.V 06.BA.VG    07.BA.VG    05.BA.VG    07.BA.V 05.BA.V 06.BA.O 02.BA.O 04.BA.O 01.BA.O 05.BA.O 03.BA.O 02.BA.VG    03.BA.V 02.BA.V 04.BA.VG    03.BA.VG    01.BA.VG    15.TO.O 31.TO.O 09.TO.O 27.TO.V 42.TO.VG    08.TO.VG    16.TO.O 07.TO.V 13.TO.O 32.TO.V 29.TO.VG    10.TO.V 25.TO.V 05.TO.VG    20.TO.O 19.TO.V 17.TO.O 35.TO.V 43.TO.O 24.TO.V 26.TO.VG    01.TO.VG    37.TO.O 04.TO.VG    33.TO.O 39.TO.VG    14.TO.O 12.TO.O 38.TO.VG    22.TO.O
Bacteroides 0.072745558 0.011789182 0.028956894 0.059031877 0.097387173 0.086673889 0.432662192 0.060246679 0.269535674 0.152713335 0.014511873 0.063421323 0.091253905 0.139856373 0.013677012 0.200847907 0.180712032 0.21332737  0.031756181 0.272166702 0.019861211 0.133804422 0.168692685 0.100862392 0.152431791 0.104702194 0.119352089 0.410334347 0.024104844 0.0493905   0.068065382 0.047854785 0.011860175 0.168986083 0.015748031 0.407974482 0.264409881 0.250364431 0.330547112 0.536443695 0.578045113 0.400459167 0.204446209 0.357879234 0.242751388 0.488863722 0.521495803 0.001852281 0.045638126 0.503566932 0.069072806 0.171181339 0.183629007 0.371751412 0.385231317 0.023690205 0.255697356 0.104054054 0.242741552 0.043973941 0.221033868 0.004587156
Prevotella  0.073080791 0.302011096 0.586048042 0.487603306 0.290973872 0.014897075 0   0.333254269 0.029445074 0   0.153034301 0.002399726 0.025658188 0.090664273 0.440294582 0.100688924 0   0   0   0   0   0.000227946 0.093623374 0   0.000197707 0.115987461 0.076442171 0   0.047507606 0.000210172 0.000243962 0.042079208 0.52184769  0   0.394750656 0   0   0.235787172 0   0.000936856 0.000300752 0   0.051607781 0   0   0   0.002289494 0.735586941 0.023828756 0   0.011200996 0   0.046374105 0   0.00044484  0.085421412 0.000455789 0.306756757 0   0.11970684  0.008912656 0.371559633"

I'm wandering bout using ggplot2 to do to do the box plot, but I'm not sure about how the data have to be formatted.... I tried this:

df <- read.csv("my.data", header=T) ggplot(data = df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Taxon))

but it gave me an error saying that the variable was not found... Anyone can help me?

Many thanks Francesca

2
  • 2
    The error is pretty informative. Are your x values in your data called variable? If they are not then R will tell you it cannot find them... Also your data looks wide it needs to be long. Posting the result of dput(my.data) is much more productive than the format you have given your data in. Commented Dec 11, 2013 at 20:25
  • Have a look at this tutorial Commented Dec 11, 2013 at 20:42

1 Answer 1

1

An quick example of how to format your data:

categs = sample(LETTERS[1:3], 120, TRUE)
y = c(rnorm(40), rnorm(40, 3, 2), rnorm(40, 5, 3))

# example dataset
dados = data.frame(categs, y)

require(ggplot2)
ggplot(dados) + geom_boxplot(aes(x = categs, y = y))

#  categs          y
#1      B  0.7392673
#2      B -0.1694076
#3      A -2.3804024
#4      B  0.5999949
#5      A  0.5816400
#6      A  2.1263669

See also http://ggplot2.org/

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.