2

when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:

baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)

this is an image of the boxplot made with the simple R boxplot function

However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?

0

2 Answers 2

2

geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...

library(tidyverse)

naive_capqx %>%  
  pivot_longer(everything(), values_to="Value", names_to="Variable") %>% 
  ggplot() +
  geom_boxplot(aes(x=Variable, y=Value))

giving

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Hi, I tried using this code but my R cannot decipher %>%. Is this a separate package I may not have loaded?
%>% is part of the magrittr package, which is contained in the tidyverse meta-package. My code is a MWE and should run as-is. You can avoid using %>% if you wish by replacing naive_capqx %>% pivot_longer(everything()... with pivot_longer(naive_capqx, everything().... Note that @RebeccaAmodeo's solution, whilst correct, uses gather(), which has been superseded by the pivot_XXXX functions..
1

Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).

pacman::p_load(ggplot2, tidyverse)

baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)

naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
  gather("key", "value")) %>%
  group_by(key)
  

ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
  geom_boxplot()

2 Comments

hi, sorry but what does %>% mean? my R cannot understand it!
Hi @NuritEliana, thanks for asking! It is an operator called the "pipe". It sends whatever is before (to the left of it) and sends it onto the next command. It eliminates the need to enter the data argument in the command that follows it because R will know that the object before the pipe is what you want to use as the data argument. I think of it as like saying to R "and then". It was originally in the magrittr package and is included in dplyr and tidyverse packages and in R 4.1. It saves a lot of typing! A video. Hope that helps!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.