9

I have two data sets, their size is 500 and 1000. I want to plot density for these two data sets in one plot.
I have done some search in google.

the data sets in above threads are the same

df <- data.frame(x = rnorm(1000, 0, 1), y = rnorm(1000, 0, 2), z = rnorm(1000, 2, 1.5))

But if I have different data size, I should normalize the data first in order to compare the density between data sets.

Is it possible to make density plot with different data size in ggplot2?

2
  • I think density plots scale the data to area = 1 by default, so there is no need to correct for sample size. Someone correct me if I'm wrong. Commented Dec 7, 2017 at 4:32
  • @neilfws yes, I think these data have scaled. But I don't know they scaled one by one or together Commented Dec 7, 2017 at 4:53

1 Answer 1

10

By default, all densities are scaled to unit area. If you have two datasets with different amounts of data, you can plot them together like so:

df1 <- data.frame(x = rnorm(1000, 0, 2))
df2 <- data.frame(y = rnorm(500, 1, 1))

ggplot() + 
  geom_density(data = df1, aes(x = x), 
               fill = "#E69F00", color = "black", alpha = 0.7) + 
  geom_density(data = df2, aes(x = y),
               fill = "#56B4E9", color = "black", alpha = 0.7)

enter image description here

However, from your latest comment, I take that that's not what you want. Instead, you want the areas under the density curves to be scaled relative to the amount of data in each group. You can do that with the ..count.. aesthetics:

df1 <- data.frame(x = rnorm(1000, 0, 2), label=rep('df1', 1000))
df2 <- data.frame(x = rnorm(500, 1, 1), label=rep('df2', 500))
df=rbind(df1, df2)

ggplot(df, aes(x, y=..count.., fill=label)) + 
  geom_density(color = "black", alpha = 0.7) + 
  scale_fill_manual(values = c("#E69F00", "#56B4E9"))

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

It looks like two data are scaled to 1, But I want to combine two data first, and then scale the combined data. Is it possible?
You'll have to be more precise in describing what it is you want to do. There are many different ways one could "combine" data, and none that I can think of would make sense in this context. Is there anything wrong with the plot I made?
Well, I want to combine two data by df1 <- data.frame(x = rnorm(1000, 0, 2), label=rep('df1', 1000)); df2 <- data.frame(x = rnorm(500, 1, 1), label=rep('df2', 500)); df=rbind(df1, df2). df contains df1 and df2. Now I want to plot density grouped by label. I think the auc of df1 is 2/3 of total, df2 is 1/3
If no legend is desired, how does one apply colors? Can we do this? fill=c("#E69F00", "#56B4E9") inside geom_density()?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.