2
  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274

I have the above dataframe, and I want to generate four graphs for it in ggplot2, each having the x axis as the "Name" column and the y axis as the other columns' values. While the x-axis won't need to have "tick marks", I do want to conditionally label the points with the name of their corresponding "Name" column value if the y-axis is below a cutoff, say 0. Below is my code using the basic plot function in R to generate the graphs automatically with loop function. I've attached one sample graph.

cutoff = 0
df = read.csv("Book4.csv", header = TRUE)
list = rownames(df)
for(i in names(df)){
  png(filename = paste(i,".png"))
  plot(df[,i],
       main = i, 
       ylab = "Values",
       xlab = "Names",
       col = ifelse(df[,i]<cutoff, 'red', 'gray'),
       pch = ifelse(df[,i] < cutoff, 10, 1)
  )
  abline(cutoff, 0, col= "blue", lty=2)
  outlier = which(df[,i]<=cutoff)
  if (length(outlier)>0){
    text(outlier, df[outlier,i], list[outlier], cex=0.7, pos=2)
  }
  dev.off()
  
}

Sample Graph generated

The issue is that these graph labels often are hidden, or when I use larger datasets overlap so I can't read them. Hence, I wanted to reproduce this using ggplot2 and the function geom_text_repel. I have attempted using for loops to do this, but got stuck at the implementation of the point labelling with geom_text_repel, as I wasn't sure how to conditionally label with that. I will be producing upwards of 200 pngs, so I'd greatly appreciate if it could be automated and outputted with the filename as "Value1.png", "Value2.png" and so forth.

Here is my attempt in ggplot below

cutoff = 0
df = read.csv("Book4.csv", header = TRUE, row.names = 1)    
for(i in colnames(df)){
      png(filename = paste(i,".png"))
      outlier = which(df[,i]<=cutoff)
      print(ggplot(df, aes(x = rownames(df), y = df[,i])) +
              geom_point() + 
              geom_text_repel(data = df, label=outlier))
      dev.off()
    }

I keep getting the error "Error: Aesthetics must be either length 1 or the same as the data (5): label" and am not sure hwo to fix that.

2 Answers 2

3

You could achieve your desired result like so:

  1. While using df[,i] will work in most cases it is not recommended and there are indeed cases where it will not work. Instead, if you want to refer to variables by strings you could use the so called .data pronoun, i.e. use .data[[i]].

  2. To get the conditional labels you can map ifelse(.data[[i]] <= cutoff, Name, "") on the label aesthetic inside aes()(!!).

library(ggplot2)
library(ggrepel)

cutoff <- 0

for (i in colnames(df)) {
  png(filename = paste(i, ".png"))
  gg <- ggplot(df, aes(x = rownames(df), y = .data[[i]])) +
    geom_point() +
    geom_text_repel(aes(label = ifelse(.data[[i]] <= cutoff, Name, "")))
  print(gg)
  dev.off()
}

enter image description here

EDIT First. If you want to use filter it's best to add the rownames as a new variable to your dataset, using e.g. df$x <- rownames(x), which can be mapped on x (I guess that this is the reason why you get an error message). Afterwards you can use data = dplyr::filter(df, .data[[i]] <= cutoff) as the dataset.

Note However, one caveat is in order. This approach is fine if you want to add another geom_point with only a subset of your data. In case of geom_text_repel however this is not recommended (That's why I used ifelse). The reason is, that geom_text_repel can only do a good job if it knows the whole data. If you pass only a subset then the labels will in general overlap with points missing from the subsetted data, as geom_text_repel does not know that these are there.

df$x <- row.names(df)
for (i in colnames(df)) {
  png(filename = paste(i, ".png"))
  gg <- ggplot(df, aes(x = x, y = .data[[i]])) +
    geom_point() +
    geom_text_repel(data = dplyr::filter(df, .data[[i]] <= cutoff), aes(x = x, y = .data[[i]], label = Name))
  print(gg)
  dev.off()
}

Data

df <- structure(list(Name = c("A1", "A2", "A3", "A4", "A5"), Value1 = c(
      -0.05970872,
      0.44143488, 0.34616897, 0.49289331, -1.39354557
    ), Value2 = c(
      -1.1651404,
      -0.7270722, -0.3891095, 1.3957877, 0.9429327
    ), Value3 = c(
      1.3516952,
      -1.9870423, 0.9123736, -0.2689896, 1.0719274
    )), class = "data.frame", row.names = c(
      "1",
      "2", "3", "4", "5"
    ))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! As a curiosity, is there any way to implement this using "data=filter" instead of "ifelse", as in "geom_point(data=filter(df, .data[[i]]<=cutoff), aes(color=factor(othervariable)))" where other variable is a categorical variable in the last column? For some reason, when I use the ifelse method for this, the results don't look so appealing and the legends are oversized. However, the "data=filter" method is returning me the same "Error: Aesthetics must be either length 1 or the same as the data (4): x".
Hi Eliot. I just made an edit to include the "version" using filter and also a possible solution for your error. Concerning the not appealing result... At least on the sample data and on my machine there is "no" difference between the two approaches (except for the positioning of the labels. See my note on this). Best S.
1

Another approach is to create a plotting function, then apply the function to each 'Value', e.g.

library(tidyverse)
library(ggrepel)

plot_data <- function(ValueX) {
  ValueX <- ensym(ValueX)
  ggplot(df, aes(y = !!ValueX,
                   x = Name)) +
    geom_text_repel(aes(label =  ifelse(!!ValueX < 0,
                        Name, NA))) +
    geom_point() +
    theme_bw(base_family = "Helvetica", base_size = 14) +
    ggtitle(ValueX) +
    theme(axis.ticks.x = element_blank(),
          legend.position = "none")
  ggsave(filename = paste(ValueX,
                         "plot.png",
                          sep = "_"),
         device = "png")
}

df <- readr::read_table("  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274") %>% 
  select(-c(X1))

## Collate unaltered colnames into a vector
vector_of_colnames <- colnames(df)[-1]

## Plot
lapply(vector_of_colnames, plot_data)

Value1_plot.png

Value2_plot.png

Value3_plot.png

It depends on your use case as to whether this approach will be useful for you. In my own work I have had to generate up to 35,000 plots at a time and this approach has advantages over using a loop, for example, I typically collate the images to a single pdf instead of producing lots of separate files (for this example, one file with 3 pages, one plot per page):

library(tidyverse)
library(ggrepel)

plot_data <- function(ValueX) {
  ValueX <- ensym(ValueX)
  ggplot(df, aes(y = !!ValueX,
                   x = Name)) +
    geom_text_repel(aes(label =  ifelse(!!ValueX < 0,
                        Name, NA))) +
    geom_point() +
    theme_bw(base_family = "Helvetica", base_size = 14) +
    ggtitle(ValueX) +
    theme(axis.ticks.x = element_blank(),
          legend.position = "none")
}

df <- readr::read_table("  Name      Value1     Value2     Value3
1   A1 -0.05970872 -1.1651404  1.3516952
2   A2  0.44143488 -0.7270722 -1.9870423
3   A3  0.34616897 -0.3891095  0.9123736
4   A4  0.49289331  1.3957877 -0.2689896
5   A5 -1.39354557  0.9429327  1.0719274") %>% 
  select(-c(X1))

## Collate unaltered colnames into a vector
vector_of_colnames <- colnames(df)[-1]

pdf(file=paste0("All_plots.pdf"))
lapply(vector_of_colnames, plot_data)
dev.off()

2 Comments

Thank you! Could you please share the meaning of "!!" in "!!ValueX", as well as the need for the "ValueX<- ensym(ValueX)" line.
It's to do with how the code is evaluated (tidy eval: tidyeval.tidyverse.org/sec-why-how.html#unquoting-code): basically, the ensym() part says to use the 'name' of the variable (eg "ValueX") instead of the 'values' of the variable, and the '!!' means to use the 'values' of the variable (e.g. "-0.05970872", "0.44143488", etc), not the name. It's a little complicated, but learning these concepts is super useful as you develop your understanding of the language. Also see: adv-r.hadley.nz/quasiquotation.html / tidyverse.org/blog/2018/07/ggplot2-tidy-evaluation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.