How to better create stacked bar graphs with multiple variables from ggplot2?

Question

I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how to do two things:

First, I would like to be able to add proper percentage tick marks for each variable rather than tick marks by count. Counts would be confusing, which is why I take out the axis labels completely.

Second, there must be a simpler way to reorganize my data to make this happen. It seems like the sort of thing I should be able to do natively in ggplot2 with plyR, but the documentation for plyR is not very clear (and I have read both the ggplot2 book and the online plyR documentation.

My best graph looks like this, the code to create it follows:

example graph

The R code I use to get it is the following:

library(epicalc)  

### recode the variables to factors ###
recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA), 
c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA))

### Combine recoded variables to a common vector
Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ)


### Create a second vector to label the first vector by original variable ###  
a1<-rep("News about Bangladesh", length(int_newcoun))
a2<-rep("Neighboring Countries", length(int_newneigh))
[...]
a17<-rep("Education", length(int_educ))


Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17)

### Create a Weighting vector of the proper length ###
Interest.weight<-rep(weight, 17)

### Make and save a new data frame from the three vectors ###
Interest.df<-cbind(Interest1, Interest2, Interest.weight)
Interest.df<-as.data.frame(Interest.df)

write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv')

### Sort the factor levels to display properly ###

Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested')

Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education')
[...]
Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics')

detach(Interest)
attach(Interest)

### Finally create the graph in ggplot2 ###

library(ggplot2)
p<-ggplot(Interest, aes(Interest2, ..count..))
p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1)))
p<-p+coord_flip()
p<-p+scale_y_continuous("", breaks=NA)
p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples")))
p
update_labels(p, list(fill='', x='', y=''))

I'd very much appreciate any tips, tricks or hints.

Instead of relevel many times you could use once factor with labels argument. You could also check reorder which could sort your levels by some variable (percent of "very interested"?) — Marek
– Marek, Commented Apr 6, 2010 at 20:45
Nice colours - think i'll use brewer purples my self one day :-) — Andreas
– Andreas, Commented Apr 7, 2010 at 8:20
Do you want a work flow to produce the data going into a chart like that plus be able to add the percentage values on top of each fill grouping in each bar? — Jay
– Jay, Commented Apr 7, 2010 at 17:22

Brandon Bertelsen · Accepted Answer · 2010-09-24 16:10:26Z

Your second problem can be solved with melt and cast from the reshape package

After you've factored the elements in your data.frame called you can use something like:

install.packages("reshape")
library(reshape)

x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations

x <- cast(x, variable + value ~., length)
colnames(x) <- c("variable","value","freq")
## Presto!
ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")

As an aside, I like to use grep to pull in columns from a messy import. For example:

x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"

And factoring is easier when you don't have to type c(' ', ...) a million times.

for(x in 1:ncol(x)) { 
df[,x] <- factor(df[,x], labels = strsplit('
Very Interested
Somewhat Interested
Not Very Interested
Not At All interested
NA
NA
NA
NA
NA
NA
', '\n')[[1]][-1]
}

Jérôme Verstrynge · Accepted Answer · 2011-09-14 19:14:16Z

2

You don't need prop.tables or count etc to do the 100% stacked bars. You just need +geom_bar(position="stack")

edited Sep 14, 2011 at 19:14

Jérôme Verstrynge

59.9k97 gold badges297 silver badges469 bronze badges

answered Sep 3, 2010 at 12:28

stevepowell99

212 bronze badges

Comments

aL3xa · Accepted Answer · 2010-04-05 19:32:59Z

1

About percentages insted of ..count.. , try:

ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()

but since it's not a good idea to shove a function into the aes(), you can write custom function to create percentages out of ..count.. , round it to n decimals etc.

You labeled this post with plyr, but I don't see any plyr in action here, and I bet that one ddply() can do the job. Online plyr documentation should suffice.

answered Apr 5, 2010 at 19:32

aL3xa

36.2k18 gold badges81 silver badges112 bronze badges

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

If I am understanding you correctly, to fix the axis labeling problem make the following change:

# p<-ggplot(Interest, aes(Interest2, ..count..))
p<-ggplot(Interest, aes(Interest2, ..density..))

As for the second one, I think you would be better off working with the reshape package. You can use it to aggregate data into groups very easily.

In reference to aL3xa's comment below...

library(ggplot2)
r<-rnorm(1000)
d<-as.data.frame(cbind(r,1:1000))
ggplot(d,aes(r,..density..))+geom_bar()

Returns...

alt text http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png

The bins are now densities...

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Apr 5, 2010 at 19:37

DrewConway

5,4577 gold badges37 silver badges32 bronze badges

3 Comments

aL3xa Over a year ago

Have you tried your syntax? You have omitted a geom_bar layer... however, if you pass ..density.. with geom_bar, you'll get several equally-sized bars. Please try to add geom_bar() and see what happens.

Matt Parker Over a year ago

It works well with continuous vars but produces the full-length bars with factors and character vectors, presumably because the density calculation doesn't know what to do with a non-continuous x. Replace r with something like f <- sample(c("Agree", "No opinion", "Disagree"), size = 1000, replace = TRUE, prob = c(.2, .5, .3)). I've run into this a number of times before, because I like density histograms and I like ggplot, but I haven't figured out a way to get it to behave yet (though I haven't tried very hard, either).

Matt Parker Over a year ago

...er, just remembered something. The reason it doesn't work is because density-based histograms use the area of the bars, and so need both numeric x and y axes.

Andreas · Accepted Answer · 2010-04-07 08:19:23Z

1

Your first question: Would this help?

geom_bar(aes(y=..count../sum(..count..)))

Your second question; could you use reorder to sort the bars? Something like

aes(reorder(Interest, Value, mean), Value)

(just back from a seven hour drive - am tired - but I guess it should work)

edited Apr 7, 2010 at 8:19

answered Apr 6, 2010 at 19:31

Andreas

6,74814 gold badges62 silver badges71 bronze badges

1 Comment

Andreas Over a year ago

sorry - I assumed you had a melted dataframe.

Collectives™ on Stack Overflow

How to better create stacked bar graphs with multiple variables from ggplot2?

5 Answers 5

Comments

Comments

Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related