Visualizing relative frequency in R / ggplot2

Question

I tried to wrap my head around the problem of how to visualize a bunch of relative frequencies in a way that makes it easy to see how they fare compared to each other. The differences aren't gigantic in terms of distribution, which, of course, I also consider something worthy to be shown. I've managed to create a relatively simple point plot, however, I don't think it really looks good enough.

The code is straightforward (albeit unfinished as far as visual tweaks are concerned), I guess:

library(ggplot2)
copuladeletion <- read.table(text = "Type    Distribution    Family
                             NP  0.39344 Austronesian    
                             NP  0.30232 Mon-Khmer
                             NP  0.3125  Tai-Kadai
                             NP  0.29230 Sinitic
                             NP  0.26785 Other
                             AdjP    0.44262 Austronesian
                             AdjP    0.53488 Mon-Khmer
                             AdjP    0.625   Tai-Kadai
                             AdjP    0.55384 Sinitic
                             AdjP    0.58928 Other
                             AdvP    0.03278 Austronesian
                             AdvP    0.00000 Mon-Khmer
                             AdvP    0.00000 Tai-Kadai
                             AdvP    0.04615 Sinitic
                             AdvP    0.07142 Other
                             EX  0.01639 Austronesian
                             EX  0.02325 Mon-Khmer
                             EX  0.00000 Tai-Kadai
                             EX  0.03076 Sinitic
                             EX  0.01785 Other
                             Clause  0.08196 Austronesian
                             Clause  0.02325 Mon-Khmer
                             Clause  0.0625  Tai-Kadai
                             Clause  0.03076 Sinitic
                             Clause  0.05357 Other
                             Other   0.01639 Austronesian
                             Other   0.11627 Mon-Khmer
                             Other   0.00000 Tai-Kadai
                             Other   0.04615 Sinitic
                             Other   0.00000 Other", header = TRUE)
ggplot(copuladeletion) + geom_point(aes(Distribution, Type, colour=Family,size=1))

Which yields the following image:

So, my questions are:

Do you think this visualization works well enough? Are there any preferable options over a simple point plot for these data?

Thank you very much in advance!

You can put size outside of aes so it doesn't get mapped to the legend. Since you have some overlapping points, consider adding a small amount of jitter (geom_jitter()). — Roman Luštrik
– Roman Luštrik, Commented Mar 29, 2016 at 9:33

hrbrmstr · Accepted Answer · 2016-03-29 14:34:02Z

3

Perhaps just another take on your strip charts:

library(ggplot2)

copuladeletion <- read.table(text=txt, header=TRUE)

gg <- ggplot(copuladeletion) 
gg <- gg + geom_point(aes(Distribution, Type, colour=Family),
                      shape="|", size=10)
gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
gg <- gg + scale_y_discrete(expand=c(0,0))
gg <- gg + scale_colour_brewer(name="", palette="Set1")
gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y")
gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.position="bottom")
gg

To slightly compensate for the overlaps (as Roman has pointed out a cpl times) you can use a proper line vs a hack-y point:

gg <- ggplot(copuladeletion) 
gg <- gg + geom_segment(aes(x=Distribution, xend=Distribution,
                            y=0, yend=1, colour=Family), size=0.25)
gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
gg <- gg + scale_y_discrete(expand=c(0,0))
gg <- gg + scale_colour_brewer(name="", palette="Set1")
gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y", switch="y")
gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_rect(color="#2b2b2b", size=0.15))
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text.y=element_text(angle=180))
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.position="bottom")
gg

You can add an aesthetic to map linetype as well (and hjust the y labels as you like). These thin lines are kinda hard to read (so tweak size at-will as well), but I do think a strip chart works pretty well for this data. You may want to "zoom out" the EX strip in a separate plot if you need to (I have no idea what this data really is trying to say :-)

edited Mar 29, 2016 at 14:34

answered Mar 29, 2016 at 11:34

hrbrmstr

79.1k11 gold badges146 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Roman Luštrik Over a year ago

Why so many assignments? Is assigning even necessary? It just makes more typing and perhaps less readable. This visualization has the down-side over the points proposed by OP of inability to be jittered.

hrbrmstr Over a year ago

Super great opinion @RomanLuštriz :-) I believe/opine (and can prove speed-wise in the context of composition) this idiom makes it easier to edit ggplot2 plots and I make pretty good plots. I'm not sure jittering is necessary either. If precision is required, another plot with a zoomed-in view can be generated. If folks don't like the assigns, translate to one giant, error-prone, comma-separated theme() parameter list.

S. Leu Over a year ago

This looks awesome, thank you so much! Once I'm done being excited about how cool this is, I'll also try to understand the coding behind it better. :-)

mtoto · Accepted Answer · 2016-03-29 12:52:15Z

1

As far as I understand you are plotting relative frequencies within each family, so alternatively to your plot, we could visualize the proportion of Type within each Family using a 100% stacked histogram.

ggplot(copuladeletion, aes(x = Family, y = Distribution, fill = Type)) +
  geom_bar(stat = "identity", position= "fill") +
  scale_y_continuous("Proportion") +
  scale_x_discrete("", expand = c(0, 0)) +
  coord_flip()

edited Mar 29, 2016 at 12:52

answered Mar 29, 2016 at 9:40

mtoto

24.3k4 gold badges62 silver badges74 bronze badges

1 Comment

S. Leu Over a year ago

Thank your for the suggestion! I also really like this approach. I tried creating a histogram like yours yesterday but, for some reason, it never looked this nice. While I will probably use the other option given, I really appreciate your effort as it might help me in understanding where I made a wrong step in my coding yesterday!

Collectives™ on Stack Overflow

Visualizing relative frequency in R / ggplot2

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related