Add missing legend to ggplot when input data is a list data frames

Question

I want to plot lines for separate data frames in the same graphic with a different color for each data frame. I can get a legend using almost the same code and aes(colour = "hard-coded-name") but I don't know the names ahead of time. I don't have enough RAM to rbind the data frames into a single data frame. I've written a sample that produces the plot with the colored lines. How do I add a legend? As in the sample, you don't know ahead of time how many data frames are in the list (ldf) or what their names are.

library('ggplot2')

f30 <- function() {
    ###############################################################
    ##### Create a list with a random number of data frames #######
    ##### The names of the list elements are "random"       #######
    ###############################################################
    f1 <- function(i) {
        b <- sample(1:10, sample(8:10, 1))
        a <- sample(1:100, length(b))
        data.frame(Before = b, After = a)
    }
    ldf <- sapply(1:sample(2:8,1), f1, simplify = FALSE)
    names(ldf) <- LETTERS[sample(1:length(LETTERS), length(ldf))]

    palette <- c(
        "#000000", "#E69F00", "#56B4E9", "#009E73", 
        "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
    )

    ###############################################################
    ##### Above this point we're just creating a sample ldf #######
    ###############################################################

    ePlot <- new.env(parent = emptyenv())
    fColorsButNoLegend <- function(ix) {
        df <- ldf[[ix]]
        n <- names(ldf)[ix]
        if (ix == 1) {
            ePlot$p <- ggplot(df, aes(x = Before, y = After)) + 
                geom_line(colour = palette[ix])
        } else {
            ePlot$p <- ePlot$p + 
                geom_line(
                    colour = palette[ix],
                    aes(x = Before, y = After), 
                    df
                )
        }
    }
    sapply(1:length(ldf), fColorsButNoLegend)

    #Add the title and display the plot
    a <- paste(names(ldf), collapse = ', ')
    ePlot$p <- ePlot$p + 
        ggtitle(paste("Before and After:", a))
    ePlot$p
}

A lineplot does not need huge data.frames. If your data.frames are too big to combine they are larger than needed for the plot. Use subsamples and combine these. — Roland
– Roland, Commented Oct 17, 2016 at 19:49
That is a good point. Actually, though this is just one part of a larger app that is memory constrained so I don't want to add needless pressure. I'm very new to ggplot. If I write a general subroutine that uses lineplots, would your advice be to add code to check the size and use subsamples? At what number of x points would I want the subsampling to kick in? — Jim Cutler
– Jim Cutler, Commented Oct 22, 2016 at 11:29
That depends on the nature of your data. If you some smooth data you can use smaller subsamples; if you have very dynamic data with many peaks and such, you might need larger subsamples. — Roland
– Roland, Commented Oct 22, 2016 at 11:34

eipi10 · Accepted Answer · 2016-10-18 16:26:10Z

1

Let's put aside, for the moment, the issue of whether you would ever need to make a line plot with more data than could be held in RAM. Since the list elements are named, you can use those names to generate a color legend, even if you don't know beforehand what those names will be.

For example, in the code below, I add the name of the list element as a new source column in the data frame, and then use that source column as the colour aesthetic. Then, just before printing the plot, I add a scale_colour_manual statement in order to set the line colors to your color palette:

  ePlot <- new.env(parent = emptyenv())
  fColorsButNoLegend <- function(ix) {
    df <- ldf[[ix]]

    # Add name of list element as a new column
    df$source = names(ldf)[ix]

    if (ix == 1) {
      ePlot$p <- ggplot(df, aes(x = Before, y = After, colour=source)) + 
        geom_line()
    } else {
      ePlot$p <- ePlot$p + 
        geom_line(
          aes(x = Before, y = After, colour=source), 
          df
        )
    }
  }
  sapply(1:length(ldf), fColorsButNoLegend)

  #Add the title and display the plot
  a <- paste(names(ldf), collapse = ', ')
  ePlot$p <- ePlot$p + 
    ggtitle(paste("Before and After:", a)) +
    scale_colour_manual(values=palette)
  ePlot$p

Here's sample output from the function:

f30()

edited Oct 18, 2016 at 16:26

answered Oct 18, 2016 at 1:34

eipi10

94.6k28 gold badges220 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jim Cutler Over a year ago

Thank you very much for taking the time to respond. Please don't think I'm not grateful even though I'm providing an alternate solution.

Jim Cutler · Accepted Answer · 2016-10-22 14:14:35Z

Serendipitously, I saw how another graph package provides an alternative to a legend that saves screen real estate and, I would think, is more efficient than adding a column or duplicating data. I thought I would provide it here in case others might find it useful. It embeds the legend info in the empty space of the graph itself. See the fAnnotate function - which is primitive but enough to provide the germ of an idea.

library('ggplot2')

f30 <- function() {
  ###############################################################
  ##### Create a list with a random number of data frames #######
  ##### The names of the list elements are "random"       #######
  ###############################################################
  f1 <- function(i) {
    b <- sample(1:10, sample(8:10, 1))
    a <- sample(1:100, length(b))
    data.frame(Before = b, After = a)
  }
  ldf <- sapply(1:sample(2:8,1), f1, simplify = FALSE)
  names(ldf) <- LETTERS[sample(1:length(LETTERS), length(ldf))]

  palette <- c(
    "#000000", "#E69F00", "#56B4E9", "#009E73", 
    "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
  )

  ###############################################################
  ##### Above this point we're just creating a sample ldf #######
  ###############################################################

  ePlot <- new.env(parent = emptyenv())
  ePlot$xMin <- Inf
  ePlot$xMax <- -Inf
  ePlot$yMin <- Inf
  ePlot$yMax <- -Inf
  fColorsButNoLegend <- function(ix) {
    df <- ldf[[ix]]

    #Compute the boundaries of x and y 
    ePlot$xMin <- min(ePlot$xMin, min(df$Before))
    ePlot$xMax <- max(ePlot$xMax, max(df$Before))
    ePlot$yMin <- min(ePlot$yMin, min(df$After))
    ePlot$yMax <- max(ePlot$yMax, max(df$After))

    n <- names(ldf)[ix]
    if (ix == 1) {
      ePlot$p <- ggplot(df, aes(x = Before, y = After)) + 
        geom_line(colour = palette[ix])
    } else {
      ePlot$p <- ePlot$p + 
        geom_line(
          colour = palette[ix],
          aes(x = Before, y = After), 
          df
        )
    }
  }
  sapply(1:length(ldf), fColorsButNoLegend)

  #Divide by length+1 to leave room on either side of the labels
  xGap <- (ePlot$xMax - ePlot$xMin) / (length(ldf) + 1)
  fAnnotate <- function(ix) {
    x <- ePlot$xMin + (ix * xGap)
    lbl <- paste('---', names(ldf)[ix])
    b <- palette[ix]
    ePlot$p <- ePlot$p + 
      annotate("text", x = x, y = -Inf, vjust = -1, label = lbl, colour = b)
  }
  sapply(1:length(ldf), fAnnotate)

  #Add the title and display the plot
  allNames <- paste(names(ldf), collapse = ', ')
  ePlot$p <- ePlot$p + 
    ggtitle(paste("Before and After:", allNames))
  ePlot$p
}

Collectives™ on Stack Overflow

Add missing legend to ggplot when input data is a list data frames

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related