13

I need to make a lot of boxplots for an upcoming publication. I would like to use ggplot2 because I think it will be more flexible for future projects, but my PI is insisting that I make these plots in the style of base-R. He specifically wants the dashed lines, so that they will appear similar to previous plots we made. I have made an example using the iris dataset to show you, using this code:

plot(iris$Species,
     iris$Sepal.Length,
     xlab='Species',
     ylab='Sepal Length',
     main='Sepal Variation Across Species',
     col='white')

base R plot

My question is how to make a similar looking plot using ggplot2?

Here is my attempt:

library("ggplot2")
ggplot(iris) +
  geom_boxplot(aes(x=Species,y=Sepal.Length),linetype="dashed") +
  ggtitle("Sepal Variation Across Species")

ggplot attempt

I need the combination of dashed and solid lines, but I cannot make anything work. I have already checked https://stats.stackexchange.com/questions/8137/how-to-add-horizontal-lines-to-ggplot2-boxplot which is very very close but no dashed lines, which we need. Also the outliers are filled circles, which is not the same as base-R.

2
  • Use outlier.color = 'black' and outlier.fill = 'white' to reproduce the circles. Commented Nov 6, 2018 at 11:08
  • Unfortunately there are no individual/separate aesthetic mappings for the IQR or median line but you could make your own version of geom_boxplot() and modify what gets passed into github.com/tidyverse/ggplot2/blob/master/R/… by adding such params. Commented Nov 6, 2018 at 11:09

3 Answers 3

17

To generate a "base R style" boxplot using ggplot2, we can layer 4 boxplot objects over top of one another. The order does matter here, so please keep this in mind if you modify the code. I strongly suggest that you explore this code by plotting each boxplot layer on its own; that way you can get a feel for how the different layers interact.

The ordering of the boxplots works like this (ordered from bottom to top):

  • (1) vertical dashed lines are placed first
  • (2) a solid box containing a median line, which covers the dashed box from (1)
  • (3) & (4) solid whisker lines, created by using errorbars with the minima set to the maxima, and vice versa.

I also added custom breaks to match your base R plot, which you can change depending on your needs. panel.border is used to create a thin border in the style of base R. To get the open circles that you want, we use outlier.shape.

The code:

library("ggplot2")

ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot(linetype = "dashed", outlier.shape = 1) +
  stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) +
  stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) +
  stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) +
  scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) +
  labs(title = "Sepal Variation Across Species",
       x = "Species",
       y = "Sepal Length") +
  theme_classic() + # remove panel background and gridlines
  theme(plot.title = element_text(hjust = 0.5,  # hjust = 0.5 centers the title
                                  size = 14,
                                  face = "bold"),
        panel.border = element_rect(linetype = "solid",
                                    colour = "black", fill = "NA", size = 0.5))

The plot:

enter image description here

Not quite exactly the same, but it seems to be a decent approximation. Hopefully this is close enough for your needs. Good luck, and happy plotting!

Sign up to request clarification or add additional context in comments.

Comments

5

Here's a wrapper around @Marcus' great solution, for convenient use and more flexibility:

geom_boxplot2 <- function(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge2", 
                          ..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL, 
                          outlier.shape = 1, outlier.size = 1.5, outlier.stroke = 0.5, 
                          outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE, 
                          na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
                          linetype = "dashed"){
  list(
    geom_boxplot(mapping = mapping, data = data, stat = stat, position = position,
                 outlier.colour = outlier.colour, outlier.color = outlier.color, 
                 outlier.fill = outlier.fill, outlier.shape = outlier.shape, 
                 outlier.size = outlier.size, outlier.stroke = outlier.stroke, 
                 outlier.alpha = outlier.alpha, notch = notch, 
                 notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm, 
                 show.legend = show.legend, inherit.aes = inherit.aes, 
                 linetype = linetype, ...),
    stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) ,
    stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) ,
    stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) ,
    theme_classic(), # remove panel background and gridlines
    theme(plot.title = element_text(hjust = 0.5,  # hjust = 0.5 centers the title
                                    size = 14,
                                    face = "bold"),
          panel.border = element_rect(linetype = "solid",
                                      colour = "black", fill = "NA", size = 0.5))
  )
}

ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot2() +
  scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) + # not sure how to generalize this
  labs(title = "Sepal Variation Across Species", y = "Sepal Length")

Comments

3

Building further on what @Marcus & @Moody_Mudskipper has provided:

geom_boxplotMod <- function(mapping = NULL, data = NULL, stat = "boxplot", 
    position = "dodge2", ..., outlier.colour = NULL, outlier.color = NULL, 
    outlier.fill = NULL, outlier.shape = 1, outlier.size = 1.5, 
    outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5,
    varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
    linetype = "dashed") # to know how these come here use: args(geom_boxplot)
    {
    list(geom_boxplot(
            mapping = mapping, data = data, stat = stat, position = position,
            outlier.colour = outlier.colour, outlier.color = outlier.color, 
            outlier.fill = outlier.fill, outlier.shape = outlier.shape, 
            outlier.size = outlier.size, outlier.stroke = outlier.stroke, 
            outlier.alpha = outlier.alpha, notch = notch, 
            notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm, 
            show.legend = show.legend, inherit.aes = inherit.aes, linetype = 
            linetype, ...),
        stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..), width = 0.25),
        #the width of the error-bar heads are decreased
        stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..), width = 0.25),
        stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..),
            outlier.shape = 1),
        theme(panel.background = element_blank(),
            panel.border = element_rect(size = 1.5, fill = NA),
            plot.title = element_text(hjust = 0.5),
            axis.title = element_text(size = 12),
            axis.text = element_text(size = 10.5))
        )
    }

library(tidyverse); library(ggplot2);
ggplot(iris, aes(x=Species,y=Sepal.Length, colour = Species)) +
    geom_boxplotMod() +
    ggtitle("Sepal Variation Across Species")

Created on 2020-07-20 by the reprex package (v0.3.0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.