0

I want to plot lines for separate data frames in the same graphic with a different color for each data frame. I can get a legend using almost the same code and aes(colour = "hard-coded-name") but I don't know the names ahead of time. I don't have enough RAM to rbind the data frames into a single data frame. I've written a sample that produces the plot with the colored lines. How do I add a legend? As in the sample, you don't know ahead of time how many data frames are in the list (ldf) or what their names are.

library('ggplot2')

f30 <- function() {
    ###############################################################
    ##### Create a list with a random number of data frames #######
    ##### The names of the list elements are "random"       #######
    ###############################################################
    f1 <- function(i) {
        b <- sample(1:10, sample(8:10, 1))
        a <- sample(1:100, length(b))
        data.frame(Before = b, After = a)
    }
    ldf <- sapply(1:sample(2:8,1), f1, simplify = FALSE)
    names(ldf) <- LETTERS[sample(1:length(LETTERS), length(ldf))]

    palette <- c(
        "#000000", "#E69F00", "#56B4E9", "#009E73", 
        "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
    )

    ###############################################################
    ##### Above this point we're just creating a sample ldf #######
    ###############################################################

    ePlot <- new.env(parent = emptyenv())
    fColorsButNoLegend <- function(ix) {
        df <- ldf[[ix]]
        n <- names(ldf)[ix]
        if (ix == 1) {
            ePlot$p <- ggplot(df, aes(x = Before, y = After)) + 
                geom_line(colour = palette[ix])
        } else {
            ePlot$p <- ePlot$p + 
                geom_line(
                    colour = palette[ix],
                    aes(x = Before, y = After), 
                    df
                )
        }
    }
    sapply(1:length(ldf), fColorsButNoLegend)

    #Add the title and display the plot
    a <- paste(names(ldf), collapse = ', ')
    ePlot$p <- ePlot$p + 
        ggtitle(paste("Before and After:", a))
    ePlot$p
}
3
  • 1
    A lineplot does not need huge data.frames. If your data.frames are too big to combine they are larger than needed for the plot. Use subsamples and combine these. Commented Oct 17, 2016 at 19:49
  • That is a good point. Actually, though this is just one part of a larger app that is memory constrained so I don't want to add needless pressure. I'm very new to ggplot. If I write a general subroutine that uses lineplots, would your advice be to add code to check the size and use subsamples? At what number of x points would I want the subsampling to kick in? Commented Oct 22, 2016 at 11:29
  • That depends on the nature of your data. If you some smooth data you can use smaller subsamples; if you have very dynamic data with many peaks and such, you might need larger subsamples. Commented Oct 22, 2016 at 11:34

2 Answers 2

1

Let's put aside, for the moment, the issue of whether you would ever need to make a line plot with more data than could be held in RAM. Since the list elements are named, you can use those names to generate a color legend, even if you don't know beforehand what those names will be.

For example, in the code below, I add the name of the list element as a new source column in the data frame, and then use that source column as the colour aesthetic. Then, just before printing the plot, I add a scale_colour_manual statement in order to set the line colors to your color palette:

  ePlot <- new.env(parent = emptyenv())
  fColorsButNoLegend <- function(ix) {
    df <- ldf[[ix]]

    # Add name of list element as a new column
    df$source = names(ldf)[ix]

    if (ix == 1) {
      ePlot$p <- ggplot(df, aes(x = Before, y = After, colour=source)) + 
        geom_line()
    } else {
      ePlot$p <- ePlot$p + 
        geom_line(
          aes(x = Before, y = After, colour=source), 
          df
        )
    }
  }
  sapply(1:length(ldf), fColorsButNoLegend)

  #Add the title and display the plot
  a <- paste(names(ldf), collapse = ', ')
  ePlot$p <- ePlot$p + 
    ggtitle(paste("Before and After:", a)) +
    scale_colour_manual(values=palette)
  ePlot$p

Here's sample output from the function:

f30()

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for taking the time to respond. Please don't think I'm not grateful even though I'm providing an alternate solution.
0

Serendipitously, I saw how another graph package provides an alternative to a legend that saves screen real estate and, I would think, is more efficient than adding a column or duplicating data. I thought I would provide it here in case others might find it useful. It embeds the legend info in the empty space of the graph itself. See the fAnnotate function - which is primitive but enough to provide the germ of an idea.

enter image description here library('ggplot2')

f30 <- function() {
  ###############################################################
  ##### Create a list with a random number of data frames #######
  ##### The names of the list elements are "random"       #######
  ###############################################################
  f1 <- function(i) {
    b <- sample(1:10, sample(8:10, 1))
    a <- sample(1:100, length(b))
    data.frame(Before = b, After = a)
  }
  ldf <- sapply(1:sample(2:8,1), f1, simplify = FALSE)
  names(ldf) <- LETTERS[sample(1:length(LETTERS), length(ldf))]

  palette <- c(
    "#000000", "#E69F00", "#56B4E9", "#009E73", 
    "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
  )

  ###############################################################
  ##### Above this point we're just creating a sample ldf #######
  ###############################################################

  ePlot <- new.env(parent = emptyenv())
  ePlot$xMin <- Inf
  ePlot$xMax <- -Inf
  ePlot$yMin <- Inf
  ePlot$yMax <- -Inf
  fColorsButNoLegend <- function(ix) {
    df <- ldf[[ix]]

    #Compute the boundaries of x and y 
    ePlot$xMin <- min(ePlot$xMin, min(df$Before))
    ePlot$xMax <- max(ePlot$xMax, max(df$Before))
    ePlot$yMin <- min(ePlot$yMin, min(df$After))
    ePlot$yMax <- max(ePlot$yMax, max(df$After))

    n <- names(ldf)[ix]
    if (ix == 1) {
      ePlot$p <- ggplot(df, aes(x = Before, y = After)) + 
        geom_line(colour = palette[ix])
    } else {
      ePlot$p <- ePlot$p + 
        geom_line(
          colour = palette[ix],
          aes(x = Before, y = After), 
          df
        )
    }
  }
  sapply(1:length(ldf), fColorsButNoLegend)

  #Divide by length+1 to leave room on either side of the labels
  xGap <- (ePlot$xMax - ePlot$xMin) / (length(ldf) + 1)
  fAnnotate <- function(ix) {
    x <- ePlot$xMin + (ix * xGap)
    lbl <- paste('---', names(ldf)[ix])
    b <- palette[ix]
    ePlot$p <- ePlot$p + 
      annotate("text", x = x, y = -Inf, vjust = -1, label = lbl, colour = b)
  }
  sapply(1:length(ldf), fAnnotate)

  #Add the title and display the plot
  allNames <- paste(names(ldf), collapse = ', ')
  ePlot$p <- ePlot$p + 
    ggtitle(paste("Before and After:", allNames))
  ePlot$p
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.