10

Is there a way to sum data with ggplot2 ?

I want to do a bubble map with the size depending of the sum of z.

Currently I'm doing something like

dd <- ddply(d, .(x,y), transform, z=sum(z))
qplot(x,y, data=dd, size=z)

But I feel I'm writing the same thing twice, I would like to be able to write something

qplot(x,y, data=dd, size=sum(z))

I had a look at stat_sum and stat_summmary but I'm not sure they are appropriate either.

Is it possible to it with ggplot2 ? If not, what would be best way to write those 2 lines.

2 Answers 2

8

It can be done using stat_sum within ggplot2. By default, the dot size represents proportions. To get dot size to represent counts, use size = ..n.. as an aesthetic. Counts (and proportions) by a third variable can be obtained by weighting by the third variable (weight = cost) as an aesthetic. Some examples, but first, some data.

library(ggplot2)
set.seed = 321
# Generate somme data
df <- expand.grid(x = seq(1:5), y = seq(1:5), KEEP.OUT.ATTRS = FALSE)
df$Count = sample(1:25, 25, replace = F)
library(plyr)
new <- dlply(df, .(Count), function(data) matrix(rep(matrix(c(data$x, data$y), ncol = 2), data$Count), byrow = TRUE, ncol = 2))
df2 <- data.frame(do.call(rbind, new))
df2$cost <- 1:325

The data contains units categorised according to two factors: X1 and X2; and a third variable which is the cost of each unit.

Plot 1: Plots the proportion of elements at each X1 - X2 combination. group=1 tells ggplot to calculate proportions out of the total number of units in the data frame.

ggplot(df2, aes(factor(X1), factor(X2))) + 
  stat_sum(aes(group = 1))

enter image description here

Plot 2: Plots the number of elements at each X1 - X2 combination.

ggplot(df2, aes(factor(X1), factor(X2))) + 
  stat_sum(aes(size = ..n..))

enter image description here

Plot 3: Plots the cost of the elements at each X1 - X2 combination, that is weight by the third variable.

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = 1, weight = cost, size = ..n..)) 

enter image description here

Plot 4: Plots the proportion of the total cost of all elements in the data frame at each X1 - X2 combination

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = 1, weight = cost)) 

enter image description here

Plot 5: Plots proportions, but instead of the proportion being out of the total cost across all elements in the data frame, the proportion is out of the cost for elements within each category of X1. That is, within each X1 category, where does the major cost for X2 units occur?

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = X1, weight = cost)) 

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Is the answer to my question plot #3 then ?
Sorry, I should have said so. Yes.
@SandyMuspratt For all plots with proportions (e.g., Plot 1), I had to explicitly ask ggplot for the proportion to be plotted using: aes(group = 1, size=..prop.. ). As this is an old post, the standard could depend on the ggplot version but it could still help someone!
2

You could put the ddply call into the qplot:

d <- data.frame(x=1:10, y=1:10, z= runif(100))
qplot(x, y, data=ddply(d, .(x,y), transform, z=sum(z)), size=z)

Or use the data.table package.

DT <- data.table(d, key='x,y')
qplot(x, y, data=DT[, sum(z), by='x,y'], size=V1)

1 Comment

I know I can do that, Your solutions are equivalent to my first attempt. I want to avoid having to specify 'x,y' twice (in the same lines or in 2 different lines)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.