1

I am interested in making a similar plot like this for iris data, with summary statistics produced on the plot: https://i.sstatic.net/Zyv2s.jpg

I am following this post over here: How to add summary statistics in histogram plot using ggplot2?

df <- iris
df.m <- melt(df, id="Species")

#Calculating the summary statistics
summ <- df.m %>% 
  group_by(variable) %>% 
  summarize(min = min(value), max = max(value), 
            mean = mean(value), q1= quantile(value, probs = 0.25), 
            median = median(value), q3= quantile(value, probs = 0.75),
            sd = sd(value))

I then modified the code to make density plots instead of histograms:

p1 <- ggplot(df.m) + geom_density(aes(x = value), fill = "grey", color = "black") + 
    facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()

I seem to be having a problem over here:

p1+geom_density(data=summ,label =split(summ,summ$variable),
npcx = 0.00, npcy = 1, hjust = 0, vjust = 1,size=2)

Does anyone know what the problem is? Also, is it possible to accomplish this with only ggplot2? I am working with a computer where I do not have the admin privileges to download many libraries (I have reshape2, dplyr, ggplot2). Should this be done using the annotate() function in ggplot2? And is there a way to change the x-axis for each graph to "log"?

1 Answer 1

3

I would suggest next approach as you have only few packages. You can add summary as a text annotation but you should play around the position of the text for each groups. Also log() transformation is possible if you apply in the aes() for ggplot(). I will show you two ways to do the annotations.

library(ggplot2)
library(dplyr)

#Data
df <- iris
df.m <- melt(df, id="Species")

Here, we create the annotations:

#Calculating the summary statistics and create the label
summ <- df.m %>% 
  group_by(variable) %>% 
  summarize(min = min(value), max = max(value), 
            mean = mean(value), q1= quantile(value, probs = 0.25), 
            median = median(value), q3= quantile(value, probs = 0.75),
            sd = sd(value)) %>%
  mutate_if(is.numeric, round, digits=2) %>%
  mutate(lab = paste("min = ", min, "\nmax = ", max, "\nmean = ", mean, 
                    "\nq1 = ", q1, "\nmedian = ", median, "\nq3 = ", q3, "\nsd = ", sd),
         position=c(1.5, 0.8, 0.25, -2)) %>% select(variable, lab, position)

If you want to define the position of the labels you have to modify position variable from previous section which determines x position. Using that the code for the plot is next:

#Plot
p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") + 
  facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()
p1 <- p1 + geom_text(data = summ, aes(x=position, label = lab), y=Inf, hjust=1, vjust=1.2, size=3)
p1

The output:

enter image description here

Annotations have the x position defined in summ. If you want to avoid it you simply use next code:

p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") + 
  facet_wrap(~variable, scales="free", ncol = 2) + theme_bw()
p1 <- p1 + geom_text(data = summ, aes(label = lab), x = Inf, y = Inf, hjust = 1, vjust = 1.2, size = 3)
p1

The output:

enter image description here

You can choose any of these options. The reason why the function you applied did not work is maybe due to grid and gridExtra packages.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.