How to make multiple ggplots in a loop with conditional labels

Question

  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274

I have the above dataframe, and I want to generate four graphs for it in ggplot2, each having the x axis as the "Name" column and the y axis as the other columns' values. While the x-axis won't need to have "tick marks", I do want to conditionally label the points with the name of their corresponding "Name" column value if the y-axis is below a cutoff, say 0. Below is my code using the basic plot function in R to generate the graphs automatically with loop function. I've attached one sample graph.

cutoff = 0
df = read.csv("Book4.csv", header = TRUE)
list = rownames(df)
for(i in names(df)){
  png(filename = paste(i,".png"))
  plot(df[,i],
       main = i, 
       ylab = "Values",
       xlab = "Names",
       col = ifelse(df[,i]<cutoff, 'red', 'gray'),
       pch = ifelse(df[,i] < cutoff, 10, 1)
  )
  abline(cutoff, 0, col= "blue", lty=2)
  outlier = which(df[,i]<=cutoff)
  if (length(outlier)>0){
    text(outlier, df[outlier,i], list[outlier], cex=0.7, pos=2)
  }
  dev.off()
  
}

Sample Graph generated

The issue is that these graph labels often are hidden, or when I use larger datasets overlap so I can't read them. Hence, I wanted to reproduce this using ggplot2 and the function geom_text_repel. I have attempted using for loops to do this, but got stuck at the implementation of the point labelling with geom_text_repel, as I wasn't sure how to conditionally label with that. I will be producing upwards of 200 pngs, so I'd greatly appreciate if it could be automated and outputted with the filename as "Value1.png", "Value2.png" and so forth.

Here is my attempt in ggplot below

cutoff = 0
df = read.csv("Book4.csv", header = TRUE, row.names = 1)    
for(i in colnames(df)){
      png(filename = paste(i,".png"))
      outlier = which(df[,i]<=cutoff)
      print(ggplot(df, aes(x = rownames(df), y = df[,i])) +
              geom_point() + 
              geom_text_repel(data = df, label=outlier))
      dev.off()
    }

I keep getting the error "Error: Aesthetics must be either length 1 or the same as the data (5): label" and am not sure hwo to fix that.

stefan · Accepted Answer · 2020-10-17 08:33:36Z

3

You could achieve your desired result like so:

While using df[,i] will work in most cases it is not recommended and there are indeed cases where it will not work. Instead, if you want to refer to variables by strings you could use the so called .data pronoun, i.e. use .data[[i]].
To get the conditional labels you can map ifelse(.data[[i]] <= cutoff, Name, "") on the label aesthetic inside aes()(!!).

library(ggplot2)
library(ggrepel)

cutoff <- 0

for (i in colnames(df)) {
  png(filename = paste(i, ".png"))
  gg <- ggplot(df, aes(x = rownames(df), y = .data[[i]])) +
    geom_point() +
    geom_text_repel(aes(label = ifelse(.data[[i]] <= cutoff, Name, "")))
  print(gg)
  dev.off()
}

EDIT First. If you want to use filter it's best to add the rownames as a new variable to your dataset, using e.g. df$x <- rownames(x), which can be mapped on x (I guess that this is the reason why you get an error message). Afterwards you can use data = dplyr::filter(df, .data[[i]] <= cutoff) as the dataset.

Note However, one caveat is in order. This approach is fine if you want to add another geom_point with only a subset of your data. In case of geom_text_repel however this is not recommended (That's why I used ifelse). The reason is, that geom_text_repel can only do a good job if it knows the whole data. If you pass only a subset then the labels will in general overlap with points missing from the subsetted data, as geom_text_repel does not know that these are there.

df$x <- row.names(df)
for (i in colnames(df)) {
  png(filename = paste(i, ".png"))
  gg <- ggplot(df, aes(x = x, y = .data[[i]])) +
    geom_point() +
    geom_text_repel(data = dplyr::filter(df, .data[[i]] <= cutoff), aes(x = x, y = .data[[i]], label = Name))
  print(gg)
  dev.off()
}

Data

df <- structure(list(Name = c("A1", "A2", "A3", "A4", "A5"), Value1 = c(
      -0.05970872,
      0.44143488, 0.34616897, 0.49289331, -1.39354557
    ), Value2 = c(
      -1.1651404,
      -0.7270722, -0.3891095, 1.3957877, 0.9429327
    ), Value3 = c(
      1.3516952,
      -1.9870423, 0.9123736, -0.2689896, 1.0719274
    )), class = "data.frame", row.names = c(
      "1",
      "2", "3", "4", "5"
    ))

edited Oct 17, 2020 at 8:33

answered Oct 14, 2020 at 17:16

stefan

130k6 gold badges42 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eliot Behr Over a year ago

Thank you! As a curiosity, is there any way to implement this using "data=filter" instead of "ifelse", as in "geom_point(data=filter(df, .data[[i]]<=cutoff), aes(color=factor(othervariable)))" where other variable is a categorical variable in the last column? For some reason, when I use the ifelse method for this, the results don't look so appealing and the legends are oversized. However, the "data=filter" method is returning me the same "Error: Aesthetics must be either length 1 or the same as the data (4): x".

stefan Over a year ago

Hi Eliot. I just made an edit to include the "version" using filter and also a possible solution for your error. Concerning the not appealing result... At least on the sample data and on my machine there is "no" difference between the two approaches (except for the positioning of the labels. See my note on this). Best S.

jared_mamrot · Accepted Answer · 2020-10-16 02:41:25Z

1

Another approach is to create a plotting function, then apply the function to each 'Value', e.g.

library(tidyverse)
library(ggrepel)

plot_data <- function(ValueX) {
  ValueX <- ensym(ValueX)
  ggplot(df, aes(y = !!ValueX,
                   x = Name)) +
    geom_text_repel(aes(label =  ifelse(!!ValueX < 0,
                        Name, NA))) +
    geom_point() +
    theme_bw(base_family = "Helvetica", base_size = 14) +
    ggtitle(ValueX) +
    theme(axis.ticks.x = element_blank(),
          legend.position = "none")
  ggsave(filename = paste(ValueX,
                         "plot.png",
                          sep = "_"),
         device = "png")
}

df <- readr::read_table("  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274") %>% 
  select(-c(X1))

## Collate unaltered colnames into a vector
vector_of_colnames <- colnames(df)[-1]

## Plot
lapply(vector_of_colnames, plot_data)

It depends on your use case as to whether this approach will be useful for you. In my own work I have had to generate up to 35,000 plots at a time and this approach has advantages over using a loop, for example, I typically collate the images to a single pdf instead of producing lots of separate files (for this example, one file with 3 pages, one plot per page):

library(tidyverse)
library(ggrepel)

plot_data <- function(ValueX) {
  ValueX <- ensym(ValueX)
  ggplot(df, aes(y = !!ValueX,
                   x = Name)) +
    geom_text_repel(aes(label =  ifelse(!!ValueX < 0,
                        Name, NA))) +
    geom_point() +
    theme_bw(base_family = "Helvetica", base_size = 14) +
    ggtitle(ValueX) +
    theme(axis.ticks.x = element_blank(),
          legend.position = "none")
}

df <- readr::read_table("  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274") %>% 
  select(-c(X1))

## Collate unaltered colnames into a vector
vector_of_colnames <- colnames(df)[-1]

pdf(file=paste0("All_plots.pdf"))
lapply(vector_of_colnames, plot_data)
dev.off()

answered Oct 16, 2020 at 2:41

jared_mamrot

26.5k5 gold badges27 silver badges56 bronze badges

2 Comments

Eliot Behr Over a year ago

Thank you! Could you please share the meaning of "!!" in "!!ValueX", as well as the need for the "ValueX<- ensym(ValueX)" line.

jared_mamrot Over a year ago

It's to do with how the code is evaluated (tidy eval: tidyeval.tidyverse.org/sec-why-how.html#unquoting-code): basically, the ensym() part says to use the 'name' of the variable (eg "ValueX") instead of the 'values' of the variable, and the '!!' means to use the 'values' of the variable (e.g. "-0.05970872", "0.44143488", etc), not the name. It's a little complicated, but learning these concepts is super useful as you develop your understanding of the language. Also see: adv-r.hadley.nz/quasiquotation.html / tidyverse.org/blog/2018/07/ggplot2-tidy-evaluation

Collectives™ on Stack Overflow

How to make multiple ggplots in a loop with conditional labels

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related