1

I am drawing distribution curves of three different datasets. They have different means and standard deviations, and thus different curves. However, the plots appear different when in the same graph.

I use the normal curve function:

std_b=0.1674
mu_b=.6058
mu_j=0.8955
std_j=0.0373
mu_s=0.9330
std_s=0.0240
normal(x,mu,sd) = (1/(sd*sqrt(2*pi)))*exp(-(x-mu)**2/(2*sd**2))
plot normal(x,mu_b,std_b) w boxes title "Boolean",\
normal(x,mu_j,std_j) w boxes title "Jaccard",\
normal(x,mu_s,std_s) w boxes title "Sorensen"

However the scale of the curves if off as seen by the difference in the Y axis. How can I scale each plot function, so that they are all at the same Y height?

enter image description here

3
  • To have all curves at the same height you would simply need to drop the factor before the exp. But then the result is wrong, because those are probability density function which are normalized such that the integral is 1. Commented Jan 1, 2016 at 19:06
  • @Cristoph so there’s really no way, either leave them as they are, or draw them separately? Commented Jan 1, 2016 at 20:22
  • 1
    Well, hard to say. Depends on what you want to emphasize. You could of course write somewhere that you plot pdf*sigma*sqrt(2*pi), but I don't know how that fits into your field. Commented Jan 1, 2016 at 21:18

1 Answer 1

2

In general, you can't.

These are probability density functions, which means that they must be positive and they must have an area of exactly 1 under the curve (the formal definition is a little more technical, but that is the statistics 101 definition). Because of that, when you make the curve less spread out (which is what the standard deviation is measuring), in order to preserve the area, you must make the peak in the middle higher.

If it helps to visualize it, think of a finite distribution in the shape of an isosceles triangle.

Sample Distributions

Both the purple and green triangles form perfectly valid probability distributions. In the case of the purple distribution, it has a base of length 10 (from 0 to 10) and a height of 1/5, giving an area of 1. If I want to make it cover a smaller range (which again is basically what the standard deviation is doing in your normal curves), I push the sides together (in this case a length of 6 - from 2 to 8), but in order to preserve the area of 1, I have to make the triangle taller (in this case a height of 1/3). If I kept the same height, I would have less than an area of 1.

In your normal distributions, the y height is controlled by the scale in front of your exponential functions. Getting a rid of that, or setting them to be the same will make them have the same height, but they will no longer be probability distributions, as the area will not be 1. In general, for a normal distribution, the smaller the standard deviation, the taller the peak.

Sign up to request clarification or add additional context in comments.

1 Comment

Makes perfect sense, as I was just hoping of presenting them in a more appealing way, but I guess this will have to do. Many thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.