0

I have a ggplot2 line chart made from three data frames for which I have controlled the color scheme. I've instead used linetype to distinguish between lines. This leads to a situation in which a legend is not automatically generated. How can I create a legend for this plot?

tpAct <- data.frame(
  Date=seq.Date(as.Date('2017-09-01'), as.Date('2018-01-01'),by='month'),
  Reg1=rnorm(5, 10, 5),
  Reg2=rnorm(5, 15, 5),
  Reg3=rnorm(5, 20, 5),
  Reg4=rnorm(5, 25, 5),
  Reg5=rnorm(5, 30, 5),
  Total=rnorm(5, 60, 5)
)

tpOL <- data.frame(  
  Date=seq.Date(as.Date('2017-09-01'), as.Date('2018-01-01'),by='month'),
  Reg1=rnorm(5, 10, 5),
  Reg2=rnorm(5, 25, 5),
  Reg3=rnorm(5, 20, 5),
  Reg4=rnorm(5, 25, 5),
  Reg5=rnorm(5, 30, 5),
  Total=rnorm(5, 60, 5)
)

tpModL2 <- data.frame(  
  Date=seq.Date(as.Date('2017-09-01'), as.Date('2018-01-01'),by='month'),
  Reg1=rnorm(5, 10, 5),
  Reg2=rnorm(5, 25, 5),
  Reg3=rnorm(5, 20, 5),
  Reg4=rnorm(5, 25, 5),
  Reg5=rnorm(5, 30, 5),
  Total=rnorm(5, 60, 5)
)

ggplot() + 
  geom_line(data=tpAct, aes(x=Date, y=Reg1), color='red', size=1.25) +
  geom_line(data=tpAct, aes(x=Date, y=Reg2), color='blue', size=1.25) + 
  geom_line(data=tpAct, aes(x=Date, y=Reg3), color='green', size=1.25) + 
  geom_line(data=tpAct, aes(x=Date, y=Reg4), color='pink', size=1.25) + 
  geom_line(data=tpAct, aes(x=Date, y=Reg5), color='yellow', size=1.25) + 
  geom_line(data=tpAct, aes(x=Date, y=Total), color='black', size=1.25) + 
  geom_line(data=tpOL, aes(x=Date, y=Reg1), linetype=5, color='red', size=1.25) +
  geom_line(data=tpOL, aes(x=Date, y=Reg2), linetype=5, color='blue', size=1.25) +
  geom_line(data=tpOL, aes(x=Date, y=Reg3), linetype=5, color='green', size=1.25) +
  geom_line(data=tpOL, aes(x=Date, y=Reg4), linetype=5, color='pink', size=1.25) +
  geom_line(data=tpOL, aes(x=Date, y=Reg5), linetype=5, color='yellow', size=1.25) +
  geom_line(data=tpOL, aes(x=Date, y=Total), linetype=5, color='black', size=1.25) + 
  geom_line(data=tpModL2, aes(x=Date, y=Reg1), linetype=4, color='red', size=1.25) +
  geom_line(data=tpModL2, aes(x=Date, y=Reg2), linetype=4, color='blue', size=1.25) +
  geom_line(data=tpModL2, aes(x=Date, y=Reg3), linetype=4, color='green', size=1.25) +
  geom_line(data=tpModL2, aes(x=Date, y=Reg4), linetype=4, color='pink', size=1.25) +
  geom_line(data=tpModL2, aes(x=Date, y=Reg5), linetype=4, color='yellow', size=1.25) +
  geom_line(data=tpModL2, aes(x=Date, y=Total), linetype=4, color='black', size=1.25) +
  labs(x='', y='Total Balances ($B)')

enter image description here

7
  • I think if you properly format your data in "long" format, you can just map the data source to linetype and you'll have your legend. Commented Sep 26, 2017 at 18:57
  • You can do this plot with a single call to geom_line. To do that, (1) convert the individual data frames to long format, (2) stack the individual data frames into a single data frame and add an indicator column to mark the name of the source data frame. Then you can map the source data frame to linetype and Reg to color, which will give you a legend and drastically reduce the amount of code needed. If you provide samples of each of your three data frames, we can provide code to show you how to do this. Commented Sep 26, 2017 at 18:57
  • @GauravBansal you have to post your data Commented Sep 26, 2017 at 19:01
  • I appreciate the suggestion to convert the data frames to long format, but is there a way to put in a legend without doing that? That would actually take a lot more work due to how the data is set up. Commented Sep 26, 2017 at 19:01
  • 1
    It probably won't take much work. If you provide data samples, we can show you how. Commented Sep 26, 2017 at 19:02

2 Answers 2

3

Here's how to stack and plot the data using the sample data frames you provided:

library(tidyverse)

setNames(list(tpAct, tpOL, tpModL2), c("tpAct","tpOL","tpModL2")) %>% 
  map_df(~ .x %>% gather(key, value, -Date), .id="source") %>% 
  ggplot(aes(Date, value, colour=key, linetype=source)) +
    geom_line() +
    scale_colour_manual(values=c('red','blue','green','pink', 'yellow', 'black')) +
    theme_classic()

setNames(list(tpAct, tpOL, tpModL2), c("tpAct","tpOL","tpModL2")) puts the three data frames in a list and assigns the data frame names as the names of the list elements.

map_df(~ .x %>% gather(key, value, -Date), .id="source") converts the individual data frames to long format and stacks them into a single long-format data frame.

Here's what the plot looks like:

enter image description here

A faceted plot might be easier to read:

setNames(list(tpAct, tpOL, tpModL2), c("tpAct","tpOL","tpModL2")) %>% 
  map_df(~ .x %>% gather(key, value, -Date), .id="source") %>% 
  ggplot(aes(Date, value, colour=key)) +
    geom_line() +
    scale_colour_manual(values=c('red','blue','green','pink', 'yellow', 'black')) +
    theme_classic() +
    facet_grid(~ source)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. Is there a way to combine the legend into one so it says "tpActReg1" and shows as red solid line, "tpModL2Reg1" and shows a red dashed line, and so forth?
You can do this with ggplot(aes(Date, value, colour=interaction(source,key), linetype=interaction(source,key))), but the legend will have 18 entries and you'll need to do some additional work with scale_colour_manual and scale_linetype_manual to get the linetypes and colors the way you want them.
1

When you find yourself wanting to manually add a legend with ggplot2, I've found it usually means you're going about making your plot in a way other than what ggplot2 intended.

To get ggplot to generate a legend for this plot, you need to reshape your data into a long format with groups before giving it to ggplot.

You'll want to combine all three data sets (tpAct(), tpOL(), tpModL2() - I'm assuming that these are functions that are returning data frames) into a single data.frame -- let's call this combineddata. The columns would then be: dataset (to denote which set the observation is from), Date, RegType (Reg1, Reg2,..Total), and Value (the actual y values you're plotting).

Then you could create a plot with something like the following:

ggplot(combineddata, aes(x=Date, y=Value, color=RegType, linetype=dataset) + 
    geom_line()

This will draw one line for each combination of regression and data set like you have above, but will do so automatically and create a legend. If you don't like the colors and line types that ggplot picks for you, you can further specify the specific colors and line types you want to use with scales:

ggplot(combineddata, aes(x=Date, y=Value, color=RegType, linetype=dataset) + 
    geom_line() +
    scale_color_manual(values = c("red", "blue", "green", "pink", "yellow", "black")) + 
    scale_linetype_manual(values = c(1, 5, 4)

You may need to explicitly make RegType and dataset ordered factors before sending the data to ggplot to get them in the correct order, or change the order of the colors and line types above.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.