Ggplot loop over unique variables in a group

Question

I made a loop to make a plot for every unique value of a variable within a group. To make my code reproducible I used nyflights13 package. Unfortunately, in here my code gives desired result. In my data however I would have flight origins that don't happen in a certain year, giving me an empty plot for that origin in that year. I would like that in one group (in this example year), Only the origins that happened in that year are shown. Could somebody help me out?

library(nycflights13)
library(tidyverse)

plotter_de_plot<-function(origination, YEARR){
  eval(substitute(origination), flights)
  eval(substitute(YEARR), flights)
  flights %>%
    subset(year==YEARR)%>%
  select(month,origin,hour,year)%>%
    group_by(origin, month) %>% 
    mutate(AMOUNT = (sum(hour, na.rm=TRUE)))  %>%
    filter(!is.na(hour),
           origin==origination,year==YEARR) %>%
    ggplot(aes(month,AMOUNT), na.rm = TRUE)+
    geom_point() +
    labs(title=origination,subtitle=YEARR)
} 
for (i in unique(flights$origin)){
  plot(plotter_de_plot(i,2013))
}

In the for loop, add if (with(flights, sum(year == 2022 & origin == "EWR") == 0)) next ? — Vincent Guillemot
– Vincent Guillemot, Commented Oct 1, 2021 at 9:56

Dan Chaltiel · Accepted Answer · 2021-10-01 10:03:35Z

2

In addition to stefan's answer which adresses the problem perfectly, I would recommend using purrr::map instead of your for loop:

my_plots = unique(flights$origin) %>% 
  set_names() %>% 
  map(plotter_de_plot, YEARR=2013)
my_plots$EWR
my_plots$LGA
my_plots$JFK

This way, you can access each plot inside a list. Another way would be to use facets.

Also, your plots are absurdly heavy (several Mb) and might take a long time to plot. That is because you are using mutate() instead of summarise().

Here is an example with facets that took <1 sec to compute:

flights %>%
  filter(year==2013)%>%
  select(month, origin, hour,year)%>%
  group_by(origin, month) %>% 
  summarise(AMOUNT = (sum(hour, na.rm=TRUE))) %>%
  ggplot(aes(month,AMOUNT), na.rm = TRUE)+
  geom_point() +
  labs(subtitle="Year 2013") + 
  facet_wrap(~origin)

edited Oct 1, 2021 at 10:03

answered Oct 1, 2021 at 9:58

Dan Chaltiel

8,6226 gold badges56 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Josse_ Over a year ago

I used mutate because I though with summarise you lose other variables, but I was mistaken. Should the = between my_lplots and unique be <- ?

Dan Chaltiel Over a year ago

= and <- do exactly the same. Coming from other programming languages, I prefer = but you can choose whichever looks better to you :-)

stefan · Accepted Answer · 2021-10-01 09:55:54Z

One option would be to break your pipeline into two parts, data wrangling and plotting. Doing so you could check whether the filtered and aggregated dataset contains any data using e.g. nrow > 0 and return NULL if it doesn't. In your for loop you could then check for NULL before plotting:

To mimic your use case I used flights$year[flights$origin == "EWR"] <- 2015 so that the example data includes an origin with no data for year 2013:

library(nycflights13)
library(tidyverse)

plotter_de_plot <- function(origination, YEARR) {
  d <- flights %>%
    select(month, origin, hour, year) %>%
    filter(
      !is.na(hour),
      origin == origination, year == YEARR
    ) %>% 
    group_by(month) %>%
    mutate(AMOUNT = sum(hour, na.rm = TRUE))
    
  if (nrow(d) > 0) {
    ggplot(d, aes(month, AMOUNT), na.rm = TRUE) +
      geom_point() +
      labs(title = origination, subtitle = YEARR)  
  }
}

flights$year[flights$origin == "EWR"] <- 2015

for (i in unique(flights$origin)) {
  p <- plotter_de_plot(i, 2013)
  if (!is.null(p)) plot(p)
}

Collectives™ on Stack Overflow

Ggplot loop over unique variables in a group

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related