0

I have an outlier data point I would like to include in my plot. I seem to have different options, including using coord_cartesian(xlim=..) and the ggbreak package. I'm not sure what is wrong with my coord_cartesian() code but it cuts off the outlier. Below is a sample of my dataset and code:

library(tidyverse)

TotalPhosphorus <- tibble(
  Depth = (c('1' ,'2', '3' , '1', '2', '3')),
  Date = c('1/18/2021', '1/18/2021', '1/18/2021', '7/11/2021', '7/11/2021', '7/11/2021'),
  DOY = c('18', '18', '18', '70', '70', '70'),
  Season= c("winter", "winter", "winter", "summer", "summer", "summer"),
  TotalP = c('20', '30', '400', '25', '30', '25')
  )

TotalP %>%
  ggplot(aes(x = TotalP, y = Depth, colour = Season, group = Season)) +
  geom_point() +
  geom_path() +
  scale_x_continuous(breaks = seq(0, 100, by = 50)) +
  coord_cartesian(xlim = c(0, 400)) +
  theme_bw()

and the original plot showing the cut off outlier: enter image description here

2
  • 2
    The TotalP column in your example data has character values. If I change those to numbers by removing the quotation marks, the plot looks as expected. However, I don't see how coord_cartesian can be used to make a break in an axis. Try using the ggbreak package. Commented Nov 25, 2024 at 3:38
  • 1
    I don't think that you should manually specify the breaks like you do it with scale_x_continuous() : you specify only 0, 50 and 100 so there is no break around 400. coord_cartesian() act like a zoom, so there is no need to specify it neither. And since you mentionned ggbreak, I think that you can use it like this : ggbreak::scale_x_break(c(50, 380)) . Personnaly, I think that you should avoid this representation, or at least avoid using geom_path() . (Remember to remove/ajust scale_x_continuous()) Commented Nov 25, 2024 at 19:25

1 Answer 1

0

Here is a way to use ggbreak to truncate your plot. I agree with @VinceGreg, using a line geom in conjunction with an axis gap is not recommended for reasons that will become apparent in the following examples.

I have modified your sample data to make both of the x/y axis variables numeric as this makes plotting easier, and I inverted and extended the y-axis to match your example plot.

First, using your defined breaks/limits:

library(dplyr)
library(ggplot2)
library(ggbreak)

TotalPhosphorus <- tibble(
  Depth = (c(1 ,2, 3 , 1, 2, 3)),
  Date = c('1/18/2021', '1/18/2021', '1/18/2021', '7/11/2021', '7/11/2021', '7/11/2021'),
  DOY = c('18', '18', '18', '70', '70', '70'),
  Season= c("winter", "winter", "winter", "summer", "summer", "summer"),
  TotalP = c(20, 30, 400, 25, 30, 25)
  )

TotalPhosphorus %>%
  ggplot(aes(x = TotalP, y = Depth, colour = Season, group = Season)) +
  geom_point() +
  geom_path() +
  scale_x_break(c(100, 390),
                space = 0.2) +
  scale_x_continuous(breaks = seq(0, 400, by = 50),
                     limits = c(0, 400)) +
  scale_y_reverse(limits = c(8, 0)) +
  labs(x = "Total Phosphorus ug/L",
       y = "Depth (m)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

1

Note that the geom_path() ends up looking weird where it crosses the axis gap, and visually, I do not think it makes sense to plot a line this way. I am not an SME for this type of data so there may be a justification for using geom_path(), but graphically it seems aes(shape = Season) would be better choice.

Further, there are limitations to using ggbreak, including the inability to use coord_cartesian(clip = "off"). In other words, if you need to set limits = c(0, 400), the outlier point will get clipped at the edge of the plot panel. Also, I personally do not like the duplicated axis labels/ticks that ggbreak defaults to.

Another issue highlighted in the above plot relates to the size of the axis gap. Choosing axis gap values that fall between an axis break can result in no label/tick to identify where the panel after the break starts. You could set the break to match an axis break value e.g. acale_x_break(c(100, 350)), but this results in a disproportionate amount of plot 'real estate' being assigned to a single point.

To address these issues, here are some suggested workarounds:

  1. use shapes to represent points and omit lines
  2. increase limits = to ensure points are not clipped
  3. create custom axis breaks and labels
  4. change the size of the breaks

The following uses options 1-3 and omits the top x-axis labels:

TotalPhosphorus %>%
  ggplot(aes(x = TotalP, y = Depth, colour = Season, shape = Season)) +
  geom_point(size = 2) +
  scale_shape_manual(values = c(24, 19)) +
  scale_x_break(c(100, 390),
                space = 0.2) +
  scale_x_continuous(breaks = c(seq(0, 100, 50), 390, 400),
                     limits = c(0, 402)) +
  scale_y_reverse(limits = c(8, 0)) +
  labs(x = "Total Phosphorus ug/L",
       y = "Depth (m)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
        axis.text.x.top = element_blank(),
        axis.ticks.x.top = element_blank(),
        axis.line.x.top = element_blank())

2

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.