3

I am creating a line plot using ggplot2, but I have missing data that is denoted by NaN. My line plot is currently not adding any line between the missing values. However, I want to connect the missing data with a dotted line, while all known data is connected with a solid line.

Here is my code for the current plot, with a small subset of my data frame and and image of the plot below.

#make ggplots for all data sets  

Q4_plot <- ggplot(data = Q4, mapping = aes(x = Year, y = Q4)) +
  geom_line() +
  geom_point() +
  labs(title = "Quarter 4 Anamolies of C. finmarchicus Population") +
  ylab("Anamoly") +
  scale_y_discrete(lim = c(-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5)) 

#subset of data frame

> dput(Q4)
structure(list(Year = c(1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017), Q4 = c(-0.2902210281654, 
-0.4349222339894, 0.6085474376776, 0.8492088796228, 0.5017554154123, 
0.4848742371842, 0.483138540113, 1.134146387603, 1.095609559681, 
0.8630386289353, 0.1163274274306, -0.3398165357991, -0.1474840957078, 
-1.344090916262, 0.3518846850911, -0.03353853195848, -0.07481708144361, 
0.2717396470301, -1.43888104698, -0.4838212547847, -0.8460008644647, 
1.061274634085, 0.1433575405896, 0.6949323748611, 0.4219329126636, 
-0.1924723455514, -0.2699464637352, NaN, 0.4931694954279, 0.7079867355531, 
-0.243929992349, 0.9881050229247, -0.2275292445512, NaN, 0.3237764596434, 
-0.3144133941847, 0.6111879054247, NaN)), row.names = c(NA, -38L
), class = c("tbl_df", "tbl", "data.frame"))

This is what my plot looks like now, and I want to add a dotted line in the areas where the solid line is disjointed.

enter image description here

I apologize if this is badly asked or worded, I am a new R user.

1
  • 1
    Can you please include a small subset of the yearly_average_anamolies data frame? You can use the dput function. dput(yearly_average_anamolies) then paste it into your question. Commented Jun 25, 2019 at 23:22

3 Answers 3

8

Here's an automated solution which relies on identifying the points on either side of missing data and feeding those into a separate geom_line.

gaps <- my_data %>%
  filter(is.na(lead(Annual)) & row_number() != n() |
          is.na(lag(Annual)) & row_number() != 1) %>%
  # This is needed to make a separate group for each pair of points.
  #  I expect it will break if a point ever has NA's on both sides...
  #  Anyone have a better idea?
  mutate(group = cumsum(row_number() %% 2))

ggplot(data = my_data, mapping = aes(x = Year, y = Annual)) +
  geom_line() +
  geom_line(data = gaps, aes(group = group), linetype = "dashed") +
  geom_point() + 
  labs(title = "Annual Anomalies of C. finmarchicus Population")

enter image description here

fake data:

set.seed(0)
my_data = data.frame(Year = 2000:2019,
                     Annual = sample(c(-5:5, NA_integer_), 10))
Sign up to request clarification or add additional context in comments.

1 Comment

# This is needed to make a separate group for each pair of points. # I expect it will break if a point ever has NA's on both sides... # Anyone have a better idea? It does break if there are >1 NA in a row. I added a line of code that determined if there were multiple NA in a row, and then only kept the first one. mutate(keep=ifelse(is.na(Annual)&lag(is.na(Annual)==TRUE),"del","keep")) %>%filter(keep=="keep")
2

Why not (1) remove the NAs and then (2) plot a second, dashed line. The dashed one will be 'under' the first one, so will only see the dashes where there was a gap.

   df %>%
    ggplot(aes(x = year, y = anomaly)) +
    geom_point() +
    geom_line() +
    geom_line(data = filter(df, is.na(Annual)==FALSE), linetype = "dashed")

Comments

0

This is actually relatively complicated. Here's one way of doing it:

library(tidyverse) 

df <- 
  tibble(
    year = 2000:2009,
    anomaly = c(1, 1.5, NaN, 0.5, 0.5, 1, 1, NaN, 1.5, 1.5)
  ) %>% 
  mutate(
    section1 = if_else(year < 2002, TRUE, FALSE),
    section2 = if_else(year %in% c(2001, 2003), TRUE, FALSE),
    section3 = if_else(year %in% 2003:2006, TRUE, FALSE),
    section4 = if_else(year %in% c(2006, 2008), TRUE, FALSE),
    section5 = if_else(year > 2007, TRUE, FALSE)
  ) %>% 
  filter(!is.na(anomaly))

df %>% 
  ggplot(aes(x = year, y = anomaly)) +
  geom_point() +
  geom_line(data = df %>% filter(section1 == TRUE)) +
  geom_line(data = df %>% filter(section2 == TRUE), linetype = 3) +
  geom_line(data = df %>% filter(section3 == TRUE)) +
  geom_line(data = df %>% filter(section4 == TRUE), linetype = 3) +
  geom_line(data = df %>% filter(section5 == TRUE))

This divides the data set into five groups, with overlapping beginning and ending points for the dashed and non-dashed lines. I also remove the NaN entries to stop ggplot from throwing a warning.

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.