Plotting missing values in ggplot2 with a separate line type

Question

I am creating a line plot using ggplot2, but I have missing data that is denoted by NaN. My line plot is currently not adding any line between the missing values. However, I want to connect the missing data with a dotted line, while all known data is connected with a solid line.

Here is my code for the current plot, with a small subset of my data frame and and image of the plot below.

#make ggplots for all data sets  

Q4_plot <- ggplot(data = Q4, mapping = aes(x = Year, y = Q4)) +
  geom_line() +
  geom_point() +
  labs(title = "Quarter 4 Anamolies of C. finmarchicus Population") +
  ylab("Anamoly") +
  scale_y_discrete(lim = c(-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5)) 

#subset of data frame

> dput(Q4)
structure(list(Year = c(1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017), Q4 = c(-0.2902210281654, 
-0.4349222339894, 0.6085474376776, 0.8492088796228, 0.5017554154123, 
0.4848742371842, 0.483138540113, 1.134146387603, 1.095609559681, 
0.8630386289353, 0.1163274274306, -0.3398165357991, -0.1474840957078, 
-1.344090916262, 0.3518846850911, -0.03353853195848, -0.07481708144361, 
0.2717396470301, -1.43888104698, -0.4838212547847, -0.8460008644647, 
1.061274634085, 0.1433575405896, 0.6949323748611, 0.4219329126636, 
-0.1924723455514, -0.2699464637352, NaN, 0.4931694954279, 0.7079867355531, 
-0.243929992349, 0.9881050229247, -0.2275292445512, NaN, 0.3237764596434, 
-0.3144133941847, 0.6111879054247, NaN)), row.names = c(NA, -38L
), class = c("tbl_df", "tbl", "data.frame"))

This is what my plot looks like now, and I want to add a dotted line in the areas where the solid line is disjointed.

I apologize if this is badly asked or worded, I am a new R user.

Can you please include a small subset of the yearly_average_anamolies data frame? You can use the dput function. dput(yearly_average_anamolies) then paste it into your question. — Tony Ladson
– Tony Ladson, Commented Jun 25, 2019 at 23:22

Jon Spring · Accepted Answer · 2019-06-26 00:41:58Z

8

Here's an automated solution which relies on identifying the points on either side of missing data and feeding those into a separate geom_line.

gaps <- my_data %>%
  filter(is.na(lead(Annual)) & row_number() != n() |
          is.na(lag(Annual)) & row_number() != 1) %>%
  # This is needed to make a separate group for each pair of points.
  #  I expect it will break if a point ever has NA's on both sides...
  #  Anyone have a better idea?
  mutate(group = cumsum(row_number() %% 2))

ggplot(data = my_data, mapping = aes(x = Year, y = Annual)) +
  geom_line() +
  geom_line(data = gaps, aes(group = group), linetype = "dashed") +
  geom_point() + 
  labs(title = "Annual Anomalies of C. finmarchicus Population")

fake data:

set.seed(0)
my_data = data.frame(Year = 2000:2019,
                     Annual = sample(c(-5:5, NA_integer_), 10))

edited Jun 26, 2019 at 0:41

answered Jun 26, 2019 at 0:10

Jon Spring

70.3k4 gold badges42 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Emilio M. Bruna Over a year ago

# This is needed to make a separate group for each pair of points. # I expect it will break if a point ever has NA's on both sides... # Anyone have a better idea? It does break if there are >1 NA in a row. I added a line of code that determined if there were multiple NA in a row, and then only kept the first one. mutate(keep=ifelse(is.na(Annual)&lag(is.na(Annual)==TRUE),"del","keep")) %>%filter(keep=="keep")

Emilio M. Bruna · Accepted Answer · 2020-09-11 20:30:14Z

2

Why not (1) remove the NAs and then (2) plot a second, dashed line. The dashed one will be 'under' the first one, so will only see the dashes where there was a gap.

   df %>%
    ggplot(aes(x = year, y = anomaly)) +
    geom_point() +
    geom_line() +
    geom_line(data = filter(df, is.na(Annual)==FALSE), linetype = "dashed")

edited Sep 11, 2020 at 20:30

answered Sep 11, 2020 at 20:21

Emilio M. Bruna

3691 gold badge4 silver badges14 bronze badges

Comments

cardinal40 · Accepted Answer · 2019-06-25 23:49:56Z

This is actually relatively complicated. Here's one way of doing it:

library(tidyverse) 

df <- 
  tibble(
    year = 2000:2009,
    anomaly = c(1, 1.5, NaN, 0.5, 0.5, 1, 1, NaN, 1.5, 1.5)
  ) %>% 
  mutate(
    section1 = if_else(year < 2002, TRUE, FALSE),
    section2 = if_else(year %in% c(2001, 2003), TRUE, FALSE),
    section3 = if_else(year %in% 2003:2006, TRUE, FALSE),
    section4 = if_else(year %in% c(2006, 2008), TRUE, FALSE),
    section5 = if_else(year > 2007, TRUE, FALSE)
  ) %>% 
  filter(!is.na(anomaly))

df %>% 
  ggplot(aes(x = year, y = anomaly)) +
  geom_point() +
  geom_line(data = df %>% filter(section1 == TRUE)) +
  geom_line(data = df %>% filter(section2 == TRUE), linetype = 3) +
  geom_line(data = df %>% filter(section3 == TRUE)) +
  geom_line(data = df %>% filter(section4 == TRUE), linetype = 3) +
  geom_line(data = df %>% filter(section5 == TRUE))

This divides the data set into five groups, with overlapping beginning and ending points for the dashed and non-dashed lines. I also remove the NaN entries to stop ggplot from throwing a warning.

Collectives™ on Stack Overflow

Plotting missing values in ggplot2 with a separate line type

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related