1

I tried to wrap my head around the problem of how to visualize a bunch of relative frequencies in a way that makes it easy to see how they fare compared to each other. The differences aren't gigantic in terms of distribution, which, of course, I also consider something worthy to be shown. I've managed to create a relatively simple point plot, however, I don't think it really looks good enough.

The code is straightforward (albeit unfinished as far as visual tweaks are concerned), I guess:

library(ggplot2)
copuladeletion <- read.table(text = "Type    Distribution    Family
                             NP  0.39344 Austronesian    
                             NP  0.30232 Mon-Khmer
                             NP  0.3125  Tai-Kadai
                             NP  0.29230 Sinitic
                             NP  0.26785 Other
                             AdjP    0.44262 Austronesian
                             AdjP    0.53488 Mon-Khmer
                             AdjP    0.625   Tai-Kadai
                             AdjP    0.55384 Sinitic
                             AdjP    0.58928 Other
                             AdvP    0.03278 Austronesian
                             AdvP    0.00000 Mon-Khmer
                             AdvP    0.00000 Tai-Kadai
                             AdvP    0.04615 Sinitic
                             AdvP    0.07142 Other
                             EX  0.01639 Austronesian
                             EX  0.02325 Mon-Khmer
                             EX  0.00000 Tai-Kadai
                             EX  0.03076 Sinitic
                             EX  0.01785 Other
                             Clause  0.08196 Austronesian
                             Clause  0.02325 Mon-Khmer
                             Clause  0.0625  Tai-Kadai
                             Clause  0.03076 Sinitic
                             Clause  0.05357 Other
                             Other   0.01639 Austronesian
                             Other   0.11627 Mon-Khmer
                             Other   0.00000 Tai-Kadai
                             Other   0.04615 Sinitic
                             Other   0.00000 Other", header = TRUE)
ggplot(copuladeletion) + geom_point(aes(Distribution, Type, colour=Family,size=1))

Which yields the following image:

enter image description here

So, my questions are:

Do you think this visualization works well enough? Are there any preferable options over a simple point plot for these data?

Thank you very much in advance!

1
  • You can put size outside of aes so it doesn't get mapped to the legend. Since you have some overlapping points, consider adding a small amount of jitter (geom_jitter()). Commented Mar 29, 2016 at 9:33

2 Answers 2

3

Perhaps just another take on your strip charts:

library(ggplot2)

copuladeletion <- read.table(text=txt, header=TRUE)

gg <- ggplot(copuladeletion) 
gg <- gg + geom_point(aes(Distribution, Type, colour=Family),
                      shape="|", size=10)
gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
gg <- gg + scale_y_discrete(expand=c(0,0))
gg <- gg + scale_colour_brewer(name="", palette="Set1")
gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y")
gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.position="bottom")
gg

enter image description here

To slightly compensate for the overlaps (as Roman has pointed out a cpl times) you can use a proper line vs a hack-y point:

gg <- ggplot(copuladeletion) 
gg <- gg + geom_segment(aes(x=Distribution, xend=Distribution,
                            y=0, yend=1, colour=Family), size=0.25)
gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
gg <- gg + scale_y_discrete(expand=c(0,0))
gg <- gg + scale_colour_brewer(name="", palette="Set1")
gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y", switch="y")
gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_rect(color="#2b2b2b", size=0.15))
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text.y=element_text(angle=180))
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.position="bottom")
gg

enter image description here

You can add an aesthetic to map linetype as well (and hjust the y labels as you like). These thin lines are kinda hard to read (so tweak size at-will as well), but I do think a strip chart works pretty well for this data. You may want to "zoom out" the EX strip in a separate plot if you need to (I have no idea what this data really is trying to say :-)

Sign up to request clarification or add additional context in comments.

3 Comments

Why so many assignments? Is assigning even necessary? It just makes more typing and perhaps less readable. This visualization has the down-side over the points proposed by OP of inability to be jittered.
Super great opinion @RomanLuštriz :-) I believe/opine (and can prove speed-wise in the context of composition) this idiom makes it easier to edit ggplot2 plots and I make pretty good plots. I'm not sure jittering is necessary either. If precision is required, another plot with a zoomed-in view can be generated. If folks don't like the assigns, translate to one giant, error-prone, comma-separated theme() parameter list.
This looks awesome, thank you so much! Once I'm done being excited about how cool this is, I'll also try to understand the coding behind it better. :-)
1

As far as I understand you are plotting relative frequencies within each family, so alternatively to your plot, we could visualize the proportion of Type within each Family using a 100% stacked histogram.

ggplot(copuladeletion, aes(x = Family, y = Distribution, fill = Type)) +
  geom_bar(stat = "identity", position= "fill") +
  scale_y_continuous("Proportion") +
  scale_x_discrete("", expand = c(0, 0)) +
  coord_flip()

enter image description here

1 Comment

Thank your for the suggestion! I also really like this approach. I tried creating a histogram like yours yesterday but, for some reason, it never looked this nice. While I will probably use the other option given, I really appreciate your effort as it might help me in understanding where I made a wrong step in my coding yesterday!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.