3

I am using the density function in R and then computing some results from the obtained densities. After that, I use the ggplot2 to display the PDFs of the same data.

However, the results are slightly different from what is shown in the respective plot - something that is confirmed by plotting the density output directly (using plot {graphics}).

Any idea why? How can I correct it, so the results and plot (from ggplot2) do match / are from exact same data?

An example of this (code and images):

srcdata = data.frame("Value" = c(4.6228, 1.7942, 4.2738, 2.1502, 2.2665, 5.1717, 4.1015, 2.5126, 4.4270, 4.4729, 2.5112, 2.3493, 2.2787, 2.0114, 4.6931, 4.6582, 3.3162, 2.2995, 4.3954, 1.8488), "Type" = c("Positive", "Negative", "Positive", "Negative", "Negative", "Positive", "Positive", "Negative", "Positive", "Positive", "Negative", "Negative", "Negative", "Negative", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative"))

bwidth <- ( density ( srcdata$Value ))$bw

sample <- split ( srcdata$Value, srcdata$Type )[ 1:2 ]

xmin = min(srcdata$Value) - 0.2 * abs(min(srcdata$Value))
xmax = max(srcdata$Value) + 0.2 * abs(max(srcdata$Value))

densities <- lapply ( sample, density, bw = bwidth, n = 512, from = xmin, to = xmax )

#plotting densities result
plot( densities [[ 1 ]], xlim = c(xmin,xmax), col = "steelblue", main = "" )
lines ( densities [[ 2 ]], col = "orange" )

#plot using ggplot2
ggplot(data = srcdata, aes(x=Value)) + geom_density(aes(group=Type, colour=Type)) + xlim(xmin, xmax)

#or with ggplot2 (using easyGgplot2)
ggplot2.density(data=srcdata, xName='Value', groupName='Type', alpha=0.5, xlim=c(xmin,xmax))

image:

enter image description here

2
  • 3
    they appear to be using different bandwidths for the radial basis function kernel. If you want them to be the same, you need to specify the same bandwidth Commented Sep 28, 2015 at 21:22
  • 1
    Yes, you're changing the defaults when calculating the densities yourself, but not when using geom_density. Commented Sep 29, 2015 at 5:35

1 Answer 1

3

The current comments correctly identify that you are using two different bandwidths to calculate densities in your two plots: the plot() graph is using the bwidth you specified as the bandwidth and the ggplot() graph uses the default bandwidth. Ideally you would pass bwidth to the ggplot graph and that would solve everything, however the commentary around an SO question here suggests that you can't pass a bandwidth parameter to stat_density or geom_density.

The easiest thing to do to get the same output in both graphs is to let density() determine the optimal bandwidth in both your manual density calculation (below) and in the ggplot graph (using the same code you already have)

densities <- lapply ( sample, density, n = 512, from = xmin, to = xmax )

Alternatively, the actual binwidth used in geom/stat_density is the pre-determined binwidth times the adjust parameter (density documentation) so you could specify an adjust value in stat_density (stat_density documentation) in an attempt to try to adjust the ggplot binwidth to match your bwidth variable. I found that an adjust value of 4.5 gives a similar (but not exact) version the original graph produced with your calculated densities:

ggplot(data = srcdata, aes(x=Value)) + 
    geom_density(aes(group=Type, colour=Type), adjust = 4.5) +
    xlim(xmin, xmax)

Adjusted ggplot density graph

EDIT You may find the answer to this question helpful if you want to specifically adjust your ggplot graph so that it uses your bwidth variable as the binwidth in the density smoothing: Understanding bandwidth smoothing in ggplot2

Sign up to request clarification or add additional context in comments.

1 Comment

You're right, thanks! I was using the bw obtained from all the samples (which is 0.5902679) and forcing that in the plot. However, I'm plotting two curves (groups from the sample data). If no bw is specified, the plot uses the lower bandwidth from the two groups (0.1232133). Thus, it seems adjust = 0.5902679/0.1232133 = 4.79062, or: adj = bwidth / min((density ( sample[[1]] ))$bw, (density ( sample[[2]] ))$bw)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.