How to programmatically overlap arbitrary stat_functions in ggplot?

Question

I am looking for a way to automatically plot an arbitrary number of stat_function objects in a single ggplot, each one with a different set of parameters, and coloring them.

Initially I thought of having one big data.table with a large number of samples from each distribution, each set associated with an index, and using geom_density, grouping and coloring by the index. This is, however, very inefficient. There is, in my opinion, no need to spend time and memory to produce and keep large sets of values if we already have parameters that perfectly describe each distribution.

I present my initial solution below, but is there a more elegant and/or practical way of doing this?

distrData.dt <- data.table( Shape = c(2.1,2.2,2.3), Scale = c(1.1,1.2,1.3), time = c(1,2,3) )

ggplot(data.table(x=c(0:15)), aes(x)) + 
apply(distrData.dt,1, FUN = function(x) stat_function(fun = dgamma,arg = list(shape=as.numeric(x[1]),scale=as.numeric(x[2])), mapping = aes_string(color=x[3]) ) ) + 
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

This is the current result:

It produces the main result, that is, it will plot as many "perfect" densities as the number of parameter sets you give it. However, I am not using aesthetics to pass parameters from the column names ("Shape" and "Scale") or to get the color of each line. As far as I understand, that is not possible, but is there another way?

My only change would be somewhat stylistic: I'd drop apply and just use a for loop to fill a list with the individual layers. That will be much more readable, and preallocating the list is no problem since you know how big distrData is, so there's no speed penalty really. — joran
– joran, Commented Sep 9, 2015 at 21:55
@joran In the context of stylistic opinions, I'd think that apply is more "R-onic", but you make a goop point on the for loop readability. Thank you for that. — MeloMCR
– MeloMCR, Commented Sep 9, 2015 at 22:32
Actually no, using apply on rows of a data.frame-ish structure is a big R anti-pattern. — joran
– joran, Commented Sep 9, 2015 at 22:37
@joran really? Can you direct me to a place where I can find this kind of "good practices"? — MeloMCR
– MeloMCR, Commented Sep 9, 2015 at 22:55
Mmm, that's just one that comes up over and over. People get caught by the fact that apply() coerces it's argument to a matrix, forcing everything to one type. The "R-ish" function for this sort of task would usually be mapply(), which I tinkered with but it was no cleaner than what you did, hence my recommendation for a for loop. But it's just a recommendation. — joran
– joran, Commented Sep 10, 2015 at 0:09

tonytonov · Accepted Answer · 2015-09-10 09:34:15Z

2

First of all, your solution is absolutely fine to me: it does the job, and it does it elegantly. I just wanted to both expand on @joran's comment and show one useful trick that's called "function factory", which is perfectly suitable for a case like yours.

So I'm building a function that returns a function with fixed parameters. Note that using force prevents from shape and scale being lazily evaluated, that is necessary since we'll be using a for loop.

I'm using data.frame instead of data.table, but there shouldn't be a significant difference. That vector("list", n) construction is preallocating space for a list, as seen in ?list. I don't think it's obligatory in this particular case (significant overhead will appear for lenghts, say, >100, unlikely here), but it's always better to avoid iteratively growing objects, that's a bad practice.

As a last remark, check the stat_function call: it seems reasonably readable, at least you can see what's the mapping and what's related to dgamma parameters.

dgamma_factory <- function(shape, scale) {
  force(shape)
  force(scale)
  function(x) dgamma(x, shape = shape, scale = scale)
}
l <- vector("list", nrow(distrData.dt))

for (i in seq.int(nrow(distrData.dt))) {
  params <- distrData.dt[i, ]
  l[[i]] <- stat_function(
    fun = dgamma_factory(params$Shape, params$Scale), 
    mapping = aes_string(color = params$time))
}
ggplot(data.frame(x=c(0:15)), aes(x)) + 
  l +
  scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

answered Sep 10, 2015 at 9:34

tonytonov

25.7k16 gold badges85 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MeloMCR Over a year ago

Thank you for the great explanation. The function factory gives a longer answer, but it does exactly what I wanted, which was to explicitly define the parameters for the distribution, not relying on order of columns form the data.frame.

Collectives™ on Stack Overflow

How to programmatically overlap arbitrary stat_functions in ggplot?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related