4

I've picked-up the ggplot2 book but I'm struggling to understand how data persists through layers.

For example, lets take a dataset and calculate the mean of each X:

thePlot = ggplot( myDF , aes_string( x = "IndepentVar" , y = "DependentVar" ) )
thePlot = thePlot + stat_summary( fun.y = mean , geom = "point" )

How do I "access" the summary statistics in the next layer? For example, lets say I want to plot a smooth line over the dataset. This seems to work:

thePlot = thePlot + stat_smooth( aes( group = 1 ) , method = "lm" , geom = "smooth" , se = FALSE )

But lets say I want to further ignore a particular X value when generating the line? How do I reference the summarized dataset to express excluding a particular X?

More generally, how is data referenced as it flows through layers? Am I always limited to the last statistics? Can I reference the original dataset?

7
  • 1
    Each layer, essentially consisting of a stat and a geom, is independent of the others. So there is no "persistence". If you want to re-use a summary statistic in a new layer, you'll have to add that summary again. (I can't think why you would want to do that, though). If you want to create layers with subsets or different data, this needs to come from either a different data.frame or a different column in the data.frame. Post some example data and a better description of what you want to do... Commented Apr 18, 2011 at 15:53
  • Can you walk me through what data is expressed with stat_smooth in the example? How did it know to grab data from myDF? what exactly is "group=1"? How would I have known that aes supports "group", its not in the documentation? Commented Apr 18, 2011 at 15:57
  • Also, how does the ..var.. play into this? Commented Apr 18, 2011 at 15:57
  • further, documentation says that stat_smooth requires X/Y aes, but I didn't provide those and it still seems to work. Commented Apr 18, 2011 at 15:59
  • and if there's no persistence, then what does "New variables produced by the statistic" mean? where can I use those new variables?? Commented Apr 18, 2011 at 16:02

1 Answer 1

4

Here is an attempt at answering your question

  1. The aesthetics defined in the ggplot call, get used as defaults in all subsequent layers if they are not explicitly defined. That is the reason geom_smooth works
  2. You can specify the data frame and aesthetics for each layer separately. For example if you want to exclude some values of x while plotting geom_smooth, you can specify subset = .(x != xvalues) inside the geom_smooth call

I can provide more detailed examples, if you have specific questions.

Hope this helps

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Ramnath - Is to too much to ask for a two or three layer example, each expressing one of the data concepts (i.e. asthetic, subset, grouping, passing a computed var to a geom, etc.) with some comments about what is happening at each layer. The more verbose the better (i.e. keep stats and geoms separate, each layer is clearly called out, etc.). I think this would be very helpful to other ggplot2 novices. Also, it wasn't clear to me whether subset is a type of asthetic or relates to the data frame. Maybe the example would make this clear. Let me know if that's too vague of an ask.
@ SFun. Sure I can provide some examples that illustrate each of these ideas more clearly
fantastic! Perhaps you could also explain what the period in subset = .(x != xvalues) means. I searched the book but couldn't find that information. I understand the double period ..var.. but not the single period.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.