3

I am looking for a way to automatically plot an arbitrary number of stat_function objects in a single ggplot, each one with a different set of parameters, and coloring them.

Initially I thought of having one big data.table with a large number of samples from each distribution, each set associated with an index, and using geom_density, grouping and coloring by the index. This is, however, very inefficient. There is, in my opinion, no need to spend time and memory to produce and keep large sets of values if we already have parameters that perfectly describe each distribution.

I present my initial solution below, but is there a more elegant and/or practical way of doing this?


distrData.dt <- data.table( Shape = c(2.1,2.2,2.3), Scale = c(1.1,1.2,1.3), time = c(1,2,3) )

ggplot(data.table(x=c(0:15)), aes(x)) + 
apply(distrData.dt,1, FUN = function(x) stat_function(fun = dgamma,arg = list(shape=as.numeric(x[1]),scale=as.numeric(x[2])), mapping = aes_string(color=x[3]) ) ) + 
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

This is the current result: enter image description here

It produces the main result, that is, it will plot as many "perfect" densities as the number of parameter sets you give it. However, I am not using aesthetics to pass parameters from the column names ("Shape" and "Scale") or to get the color of each line. As far as I understand, that is not possible, but is there another way?

6
  • 3
    My only change would be somewhat stylistic: I'd drop apply and just use a for loop to fill a list with the individual layers. That will be much more readable, and preallocating the list is no problem since you know how big distrData is, so there's no speed penalty really. Commented Sep 9, 2015 at 21:55
  • @joran In the context of stylistic opinions, I'd think that apply is more "R-onic", but you make a goop point on the for loop readability. Thank you for that. Commented Sep 9, 2015 at 22:32
  • 2
    Actually no, using apply on rows of a data.frame-ish structure is a big R anti-pattern. Commented Sep 9, 2015 at 22:37
  • @joran really? Can you direct me to a place where I can find this kind of "good practices"? Commented Sep 9, 2015 at 22:55
  • 2
    Mmm, that's just one that comes up over and over. People get caught by the fact that apply() coerces it's argument to a matrix, forcing everything to one type. The "R-ish" function for this sort of task would usually be mapply(), which I tinkered with but it was no cleaner than what you did, hence my recommendation for a for loop. But it's just a recommendation. Commented Sep 10, 2015 at 0:09

1 Answer 1

2

First of all, your solution is absolutely fine to me: it does the job, and it does it elegantly. I just wanted to both expand on @joran's comment and show one useful trick that's called "function factory", which is perfectly suitable for a case like yours.

So I'm building a function that returns a function with fixed parameters. Note that using force prevents from shape and scale being lazily evaluated, that is necessary since we'll be using a for loop.

I'm using data.frame instead of data.table, but there shouldn't be a significant difference. That vector("list", n) construction is preallocating space for a list, as seen in ?list. I don't think it's obligatory in this particular case (significant overhead will appear for lenghts, say, >100, unlikely here), but it's always better to avoid iteratively growing objects, that's a bad practice.

As a last remark, check the stat_function call: it seems reasonably readable, at least you can see what's the mapping and what's related to dgamma parameters.

dgamma_factory <- function(shape, scale) {
  force(shape)
  force(scale)
  function(x) dgamma(x, shape = shape, scale = scale)
}
l <- vector("list", nrow(distrData.dt))

for (i in seq.int(nrow(distrData.dt))) {
  params <- distrData.dt[i, ]
  l[[i]] <- stat_function(
    fun = dgamma_factory(params$Shape, params$Scale), 
    mapping = aes_string(color = params$time))
}
ggplot(data.frame(x=c(0:15)), aes(x)) + 
  l +
  scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the great explanation. The function factory gives a longer answer, but it does exactly what I wanted, which was to explicitly define the parameters for the distribution, not relying on order of columns form the data.frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.