7

I am trying to subset a layer of a plot where I am passing the data to ggplot through a pipe.

Here is an example:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(aes(x = Month, y = `Relative Value`))

This gives:

enter image description here

I would like only one of those lines to appear, which I would be able to do if something like this worked in the geom_line layer:

  geom_line(subset = .(Indicator == 1), aes(x = Month, y = `Relative Value`))

Edit:

Session info:

R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 x64 (build 9200)

locale: 2 LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: 2 stats graphics grDevices utils
datasets methods base

other attached packages: 2 scales_0.3.0 lubridate_1.3.3 ggplot2_1.0.1 lazyeval_0.1.10 dplyr_0.4.3 RSQLite_1.0.0
readr_0.2.2 [8] RJDBC_0.2-5 DBI_0.3.1 rJava_0.9-7

loaded via a namespace (and not attached): 2 Rcpp_0.12.2
knitr_1.11 magrittr_1.5 MASS_7.3-40 munsell_0.4.2
lattice_0.20-31 [7] colorspace_1.2-6 R6_2.1.1 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 parallel_3.2.1 [13] grid_3.2.1
gtable_0.1.2 htmltools_0.2.6 yaml_2.1.13 assertthat_0.1
digest_0.6.8 [19] reshape2_1.4.1 memoise_0.2.1
rmarkdown_0.8.1 labeling_0.3 stringi_1.0-1 zoo_1.7-12
[25] proto_0.3-10

5
  • 1
    I don't get the same plot as you, my lines are scaled quite differently. Also you should set a random seed so we can all work with the same plot. Commented Dec 26, 2015 at 8:42
  • @MikeWise sessionInfo and seed added. Commented Dec 26, 2015 at 8:45
  • @MikeWise Have just done that. Commented Dec 26, 2015 at 8:47
  • Ok, reinitialized my workspace and the scale issue went away. Was some wierd side effect of earlier ggplot calls. Commented Dec 26, 2015 at 8:49
  • 1
    @MikeWise Yeah, I figured. The piped data should clearly be available down the line to be used with subset, but the usual suspects such as . do not appear to work. @Hadley Halp? Commented Dec 26, 2015 at 8:52

3 Answers 3

16

tl;dr: Pass the data to that layer as a function that subsets the plot's data according to your criteria.


According to ggplots documentation on layers, you have 3 options when passing the data to a new layer:

  1. If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().
  2. A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.
  3. A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.

The first two options are the most usual ones, but the 3rd is perfect for our needs when the data has been modified through pyps.

In your example, adding data = function(x) subset(x,Indicator == 1) to the geom_line does the trick:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(data = function(x) subset(x,Indicator == 1), aes(x = Month, y = `Relative Value`))

This is the resulting plot

Sign up to request clarification or add additional context in comments.

1 Comment

How did I not know about the function option!
2
library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(aes(x = Month, y = `Relative Value`,linetype=Indicator)) +
  scale_linetype_manual(values=c("1"="solid","2"="blank"))

yields:

enter image description here

5 Comments

Haha, just hide one of the lines. I like it. Let me see if there is another way to do this, else will mark this. Still curious about how to access piped data (basically not explicitly named data) within ggplot layers.
Pipes don't give you much chance to hack the data on the fly.
Down votes don't give much feedback - especially 18 months after the fact. How about a comment instead so I have a chance to fix whatever is bothering you?
This answer doesn't answer the spirit of the question. It works for his particular case, but not in general
Wtf does "spirit of the question" mean anyway? Seems like the OP would know better than you. Actually it looks to me like a change in something (either ggplot or R) has broken my answer - it simply does not work now like it used to. I might get around to fixing it on the weekend. If you had actually tried my code you would have seen that.
0

You might benefit from stat_subset(), a stat I made for my personal use that is available in metR: https://eliocamp.github.io/metR/articles/Visualization-tools.html#stat_subset

It has an aesthetic called subset that takes a logical expression and subsets the data accordingly.


library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
   group_by(Month) %>% 
   mutate(`Relative Value` = Value/sum(Value)) %>% 
   ungroup() %>% 
   ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
   geom_bar(position = "fill", stat = "identity") + 
   theme_bw()+ 
   scale_y_continuous(labels = percent_format()) + 
   metR::stat_subset(aes(x = Month, y = `Relative Value`, subset = Indicator == 1), 
               geom = "line")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.