9

I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how to do two things:

First, I would like to be able to add proper percentage tick marks for each variable rather than tick marks by count. Counts would be confusing, which is why I take out the axis labels completely.

Second, there must be a simpler way to reorganize my data to make this happen. It seems like the sort of thing I should be able to do natively in ggplot2 with plyR, but the documentation for plyR is not very clear (and I have read both the ggplot2 book and the online plyR documentation.

My best graph looks like this, the code to create it follows:

example graph

The R code I use to get it is the following:

library(epicalc)  

### recode the variables to factors ###
recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA), 
c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA))

### Combine recoded variables to a common vector
Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ)


### Create a second vector to label the first vector by original variable ###  
a1<-rep("News about Bangladesh", length(int_newcoun))
a2<-rep("Neighboring Countries", length(int_newneigh))
[...]
a17<-rep("Education", length(int_educ))


Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17)

### Create a Weighting vector of the proper length ###
Interest.weight<-rep(weight, 17)

### Make and save a new data frame from the three vectors ###
Interest.df<-cbind(Interest1, Interest2, Interest.weight)
Interest.df<-as.data.frame(Interest.df)

write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv')

### Sort the factor levels to display properly ###

Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested')

Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education')
[...]
Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics')

detach(Interest)
attach(Interest)

### Finally create the graph in ggplot2 ###

library(ggplot2)
p<-ggplot(Interest, aes(Interest2, ..count..))
p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1)))
p<-p+coord_flip()
p<-p+scale_y_continuous("", breaks=NA)
p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples")))
p
update_labels(p, list(fill='', x='', y=''))

I'd very much appreciate any tips, tricks or hints.

4
  • Instead of relevel many times you could use once factor with labels argument. You could also check reorder which could sort your levels by some variable (percent of "very interested"?) Commented Apr 6, 2010 at 20:45
  • 1
    Nice colours - think i'll use brewer purples my self one day :-) Commented Apr 7, 2010 at 8:20
  • Do you want a work flow to produce the data going into a chart like that plus be able to add the percentage values on top of each fill grouping in each bar? Commented Apr 7, 2010 at 17:22
  • ideally the work flow would produce both. Commented Apr 18, 2010 at 22:06

5 Answers 5

2

Your second problem can be solved with melt and cast from the reshape package

After you've factored the elements in your data.frame called you can use something like:

install.packages("reshape")
library(reshape)

x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations

x <- cast(x, variable + value ~., length)
colnames(x) <- c("variable","value","freq")
## Presto!
ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")

As an aside, I like to use grep to pull in columns from a messy import. For example:

x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"

And factoring is easier when you don't have to type c(' ', ...) a million times.

for(x in 1:ncol(x)) { 
df[,x] <- factor(df[,x], labels = strsplit('
Very Interested
Somewhat Interested
Not Very Interested
Not At All interested
NA
NA
NA
NA
NA
NA
', '\n')[[1]][-1]
}
Sign up to request clarification or add additional context in comments.

Comments

2

You don't need prop.tables or count etc to do the 100% stacked bars. You just need +geom_bar(position="stack")

Comments

1

About percentages insted of ..count.. , try:

ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()

but since it's not a good idea to shove a function into the aes(), you can write custom function to create percentages out of ..count.. , round it to n decimals etc.

You labeled this post with plyr, but I don't see any plyr in action here, and I bet that one ddply() can do the job. Online plyr documentation should suffice.

Comments

1

If I am understanding you correctly, to fix the axis labeling problem make the following change:

# p<-ggplot(Interest, aes(Interest2, ..count..))
p<-ggplot(Interest, aes(Interest2, ..density..))

As for the second one, I think you would be better off working with the reshape package. You can use it to aggregate data into groups very easily.

In reference to aL3xa's comment below...

library(ggplot2)
r<-rnorm(1000)
d<-as.data.frame(cbind(r,1:1000))
ggplot(d,aes(r,..density..))+geom_bar()

Returns...

alt text http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png

The bins are now densities...

3 Comments

Have you tried your syntax? You have omitted a geom_bar layer... however, if you pass ..density.. with geom_bar, you'll get several equally-sized bars. Please try to add geom_bar() and see what happens.
It works well with continuous vars but produces the full-length bars with factors and character vectors, presumably because the density calculation doesn't know what to do with a non-continuous x. Replace r with something like f <- sample(c("Agree", "No opinion", "Disagree"), size = 1000, replace = TRUE, prob = c(.2, .5, .3)). I've run into this a number of times before, because I like density histograms and I like ggplot, but I haven't figured out a way to get it to behave yet (though I haven't tried very hard, either).
...er, just remembered something. The reason it doesn't work is because density-based histograms use the area of the bars, and so need both numeric x and y axes.
1

Your first question: Would this help?

geom_bar(aes(y=..count../sum(..count..)))

Your second question; could you use reorder to sort the bars? Something like

aes(reorder(Interest, Value, mean), Value)

(just back from a seven hour drive - am tired - but I guess it should work)

1 Comment

sorry - I assumed you had a melted dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.