0

I made a loop to make a plot for every unique value of a variable within a group. To make my code reproducible I used nyflights13 package. Unfortunately, in here my code gives desired result. In my data however I would have flight origins that don't happen in a certain year, giving me an empty plot for that origin in that year. I would like that in one group (in this example year), Only the origins that happened in that year are shown. Could somebody help me out?

library(nycflights13)
library(tidyverse)

plotter_de_plot<-function(origination, YEARR){
  eval(substitute(origination), flights)
  eval(substitute(YEARR), flights)
  flights %>%
    subset(year==YEARR)%>%
  select(month,origin,hour,year)%>%
    group_by(origin, month) %>% 
    mutate(AMOUNT = (sum(hour, na.rm=TRUE)))  %>%
    filter(!is.na(hour),
           origin==origination,year==YEARR) %>%
    ggplot(aes(month,AMOUNT), na.rm = TRUE)+
    geom_point() +
    labs(title=origination,subtitle=YEARR)
} 
for (i in unique(flights$origin)){
  plot(plotter_de_plot(i,2013))
}

1
  • In the for loop, add if (with(flights, sum(year == 2022 & origin == "EWR") == 0)) next ? Commented Oct 1, 2021 at 9:56

2 Answers 2

2

In addition to stefan's answer which adresses the problem perfectly, I would recommend using purrr::map instead of your for loop:

my_plots = unique(flights$origin) %>% 
  set_names() %>% 
  map(plotter_de_plot, YEARR=2013)
my_plots$EWR
my_plots$LGA
my_plots$JFK

This way, you can access each plot inside a list. Another way would be to use facets.

Also, your plots are absurdly heavy (several Mb) and might take a long time to plot. That is because you are using mutate() instead of summarise().

Here is an example with facets that took <1 sec to compute:

flights %>%
  filter(year==2013)%>%
  select(month, origin, hour,year)%>%
  group_by(origin, month) %>% 
  summarise(AMOUNT = (sum(hour, na.rm=TRUE))) %>%
  ggplot(aes(month,AMOUNT), na.rm = TRUE)+
  geom_point() +
  labs(subtitle="Year 2013") + 
  facet_wrap(~origin)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

I used mutate because I though with summarise you lose other variables, but I was mistaken. Should the = between my_lplots and unique be <- ?
= and <- do exactly the same. Coming from other programming languages, I prefer = but you can choose whichever looks better to you :-)
2

One option would be to break your pipeline into two parts, data wrangling and plotting. Doing so you could check whether the filtered and aggregated dataset contains any data using e.g. nrow > 0 and return NULL if it doesn't. In your for loop you could then check for NULL before plotting:

To mimic your use case I used flights$year[flights$origin == "EWR"] <- 2015 so that the example data includes an origin with no data for year 2013:

library(nycflights13)
library(tidyverse)

plotter_de_plot <- function(origination, YEARR) {
  d <- flights %>%
    select(month, origin, hour, year) %>%
    filter(
      !is.na(hour),
      origin == origination, year == YEARR
    ) %>% 
    group_by(month) %>%
    mutate(AMOUNT = sum(hour, na.rm = TRUE))
    
  if (nrow(d) > 0) {
    ggplot(d, aes(month, AMOUNT), na.rm = TRUE) +
      geom_point() +
      labs(title = origination, subtitle = YEARR)  
  }
}

flights$year[flights$origin == "EWR"] <- 2015

for (i in unique(flights$origin)) {
  p <- plotter_de_plot(i, 2013)
  if (!is.null(p)) plot(p)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.