gnuplot scale plot function to same height

Question

I am drawing distribution curves of three different datasets. They have different means and standard deviations, and thus different curves. However, the plots appear different when in the same graph.

I use the normal curve function:

std_b=0.1674
mu_b=.6058
mu_j=0.8955
std_j=0.0373
mu_s=0.9330
std_s=0.0240
normal(x,mu,sd) = (1/(sd*sqrt(2*pi)))*exp(-(x-mu)**2/(2*sd**2))
plot normal(x,mu_b,std_b) w boxes title "Boolean",\
normal(x,mu_j,std_j) w boxes title "Jaccard",\
normal(x,mu_s,std_s) w boxes title "Sorensen"

However the scale of the curves if off as seen by the difference in the Y axis. How can I scale each plot function, so that they are all at the same Y height?

To have all curves at the same height you would simply need to drop the factor before the exp. But then the result is wrong, because those are probability density function which are normalized such that the integral is 1. — Christoph
– Christoph, Commented Jan 1, 2016 at 19:06
@Cristoph so there’s really no way, either leave them as they are, or draw them separately? — Ælex
– Ælex, Commented Jan 1, 2016 at 20:22
Well, hard to say. Depends on what you want to emphasize. You could of course write somewhere that you plot pdf*sigma*sqrt(2*pi), but I don't know how that fits into your field. — Christoph
– Christoph, Commented Jan 1, 2016 at 21:18

Matthew · Accepted Answer · 2016-01-23 08:49:31Z

2

In general, you can't.

These are probability density functions, which means that they must be positive and they must have an area of exactly 1 under the curve (the formal definition is a little more technical, but that is the statistics 101 definition). Because of that, when you make the curve less spread out (which is what the standard deviation is measuring), in order to preserve the area, you must make the peak in the middle higher.

If it helps to visualize it, think of a finite distribution in the shape of an isosceles triangle.

Both the purple and green triangles form perfectly valid probability distributions. In the case of the purple distribution, it has a base of length 10 (from 0 to 10) and a height of 1/5, giving an area of 1. If I want to make it cover a smaller range (which again is basically what the standard deviation is doing in your normal curves), I push the sides together (in this case a length of 6 - from 2 to 8), but in order to preserve the area of 1, I have to make the triangle taller (in this case a height of 1/3). If I kept the same height, I would have less than an area of 1.

In your normal distributions, the y height is controlled by the scale in front of your exponential functions. Getting a rid of that, or setting them to be the same will make them have the same height, but they will no longer be probability distributions, as the area will not be 1. In general, for a normal distribution, the smaller the standard deviation, the taller the peak.

edited Jan 23, 2016 at 8:49

answered Jan 2, 2016 at 6:56

Matthew

7,6201 gold badge28 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ælex Over a year ago

Makes perfect sense, as I was just hoping of presenting them in a more appealing way, but I guess this will have to do. Many thanks!

Collectives™ on Stack Overflow

gnuplot scale plot function to same height

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related