0

This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer! I have a working script to create the attached boxplot in basic R. https://i.sstatic.net/NaATo.jpg

This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons. I've looked at the following questions and they are close, but not complete: Why does a boxplot in ggplot requires axis x and y? How do you draw a boxplot without specifying x axis?

My data is basically like "mtcars" if all the numerical variables were on the same scale. All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale. What am I missing?

3
  • ggplot requires data in long format. You need to convert your data to long format with, e.g., tidyr::gather or reshape2::melt. This will not demo well on mtcars since (a) mtcars doesn't have ID variables for the x axis (though we could convert the rownames to a column) and (b) it wouldn't look very nice with some discrete data and almost nothing on the same scale. But if you get your data in long format, your ggplot should be as easy as ggplot(long_data, aes(x = variable, y = value)) + geom_boxplot(). Commented Aug 24, 2016 at 23:33
  • Basically, if mtcars was 75 vehicle models and each column variable was cylinders for 10 columns. Each column of cylinder was a different year. So it covered 1986 to 1995 year's worth of cylinders. In basic I would just write: Commented Aug 24, 2016 at 23:53
  • SORRY---, In basic I would just write something like: boxplot(mtcars$cyl1986, mtcars$cyl1987...) and so on. But I can't for the life of me do this simple boxplot in ggplot or qplot. I know it's because it's a more advanced package, but still. Commented Aug 24, 2016 at 23:56

1 Answer 1

3

Though I don't think mtcars makes a great example for this, here it is:

First, we make the data (hopefully) more similar to yours by using a column instead of rownames.

mt = mtcars
mt$car = row.names(mtcars)

Then we reshape to long format:

mt_long = reshape2::melt(mt, id.vars = "car")

Then the plot is easy:

library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
    geom_boxplot()

enter image description here

Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.

Sign up to request clarification or add additional context in comments.

10 Comments

Basically, if mtcars was 75 vehicle models and each column variable was only cylinders for 10 columns' worth. Each column of cylinder was a different year. So it covered 1986 to 1995 year's worth of cylinders. In basic I would just write something like: boxplot(mtcars$cyl1986, mtcars$cyl1987...) and so on. But I can't for the life of me do this simple boxplot in ggplot or qplot. I know it's because it's a more advanced package, but still. I tried this code and got something very different. (can't figure out how to attach it to this comment. Such a noob.
Code shouldn't go in comments - it's very cramped. What you should do is reproducibly share some of your actual data. Put, say dput(droplevels(head(your_data, 20))) in your question.
That said, if you open a new R session and run the code I show - assuming you have relatively current versions of ggplot2 and reshape2, you should match my output exactly. There's no smoke and mirrors here. I copy/pasted my code and plot into this answer.
I'm very much out of my league here, so I thank you immensley for your willingness to help me. I may have to just keep my basic R plot, because I simply can't figure this out. I don't know how to paste anything further in this comment section (let the downvotes continue!), but at best I can just describe my dataset. I have a csv file with 13 columns and 75 rows of data. The rows are locations with mercury readings, so 75 different locations. The columns are mercury readings for each month (the first column is just each location name).
Gregor, I want to thank you for helping me first and very fast last night. I've got everything working well now as of this morning and it's because of you. You rock!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.