9

I am trying to produce weighted density plots with R using the ggplot2 package and save them as .png files. In my code I am producing 100-1000 of these plots, with different geographical coordinates.

The problem is that, if my data set is even 1500 points, then the ggsave function becomes really slow. Then it approximately takes 100s to save one of these plots. From what I have understood, the computational inefficiency comes from the fact that the ggplot2 objects I'm plotting are grids and the ggsave has to print them before saving them.

So, I'm asking is there any way to make the saving of these ggplot2 objects more efficient? I mean any other way than lowering the resolution of the kde2d density estimate, which would indeed make the data frame to be plotted smaller.

I have provided a minimum working example, where I produce one of the .png files. When you use system.time() around the ggsave function, you will see that it takes around 100s to perform it.

library(MASS)
library(ggplot2)
library(grid)


x <- runif(1550, 0, 100)
y <- runif(1550, 0, 100)
wg <- runif(1550, 0, 1)

data <- data.frame(x, y, wg)


source("C:/Users/cpt2avo/Documents/R/kde2dweighted.r")
dens <- kde2d.weighted(data$x, data$y, data$wg)
dfdens <- data.frame(expand.grid(x=dens$x, y=dens$y), z=as.vector(dens$z))

p <- ggplot(data, aes(x = x, y = y)) + stat_contour(data = dfdens, geom = "polygon", bins = 20, alpha = 0.2, aes(x = x, y = y, z = z, fill = ..level..)) + scale_fill_continuous(low = "green", high = "red") + scale_alpha(range = c(0,1), limits = c(0.5, 1), na.value = 0) + labs(x = NULL, y = NULL) + theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.line = element_blank(), plot.margin = unit(c(0,0,-0.5,-0.5), "line"), panel.border = element_blank(), panel.grid = element_blank(), panel.margin = unit(c(0,0,0,0), "mm"), legend.position = "none", plot.background = element_rect(fill = "transparent", colour = NA), panel.background = element_blank())

system.time(ggsave(p, file = "C:/Users/cpt2avo/Documents/R/example.png", width = 2, height = 2, units = "in", dpi = 128))

The kde2d.weighted is a function for doing 2d weighted kernel density estimates.

kde2d.weighted <- function (x, y, w, h, n = 25, lims = c(range(x), range(y))) {
      nx <- length(x)
      if (length(y) != nx) 
        stop("data vectors must be the same length")
      if (length(w) != nx & length(w) != 1)
        stop("weight vectors must be 1 or length of data")
      gx <- seq(lims[1], lims[2], length = n) # gridpoints x
      gy <- seq(lims[3], lims[4], length = n) # gridpoints y
      if (missing(h)) 
        h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
      if (missing(w)) 
        w <- numeric(nx)+1;
      h <- h/4
      ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
      ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
      z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
      return(list(x = gx, y = gy, z = z))
    }
6
  • Using your code above, it takes my system 0.1 seconds. Does that exact same code really take 100s? Or are you creating a higher-resolution plot or one with more grid points or something? Commented Jul 8, 2014 at 8:12
  • @Spacedman I ran the exactly same code multiple times, but get on average a 100s run time. Maybe the problem lies somewhere else then... Commented Jul 8, 2014 at 8:57
  • How long does it take to do a ggsave of a simple ggplot scatterplot with ten dots in it (should eliminate some problem with ggsave)? How long does it take to display one of these grid things on screen? Commented Jul 8, 2014 at 9:07
  • The system.time() statistics for the ten dot scatterplot was user = 0.24, system = 0.05, elapsed = 0.28. To display for example the grid thing the code I posted above produces (using print(p)) took 100s. So it seems that its not the saving .png, but displaying the ggplot2 object that is really slow for me. Commented Jul 8, 2014 at 9:36
  • Does the slowdown only kick in when you have over 1500 points? Or is it gradual? Its really weird and I suspect something is wrong with your installation - are you on R 3.1 and the latest of every package? Commented Jul 9, 2014 at 7:33

1 Answer 1

3

@AntonvSchantz I ran into the same problems as you did, am having very similar experiences. Indeed, it's exporting to high-resolution png via ggsave() which makes this process slow. My resolution was to go with exporting into pdf, by doing something like:

Above your plot creation pdf(paste("plots/my_filename", rn , ".pdf", sep = ""), width = 11, height = 8)

Below your plot creation: dev.off()

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.