8

Given the following ggplot2 chart:

ggplot(my_data, aes(colour=my_factor) +   
                geom_point(aes(x=prior, y=current)) +   
                facet_grid(gender ~ age)

I would like to make the size of the points be proportional to the count of my_factor for that prior/current combination.

ggplot(my_data, aes(colour=my_factor, 
                size=<something-here>(my_factor)) +   
                geom_point(aes(x=prior, y=current)) + 
                facet_grid(gender ~ age)

Any ideas?

== Edit ==

Here's a very trivial example based on mpg dataset. Let's define "great_hwy" as hwy > 35, and "great_cty" as cty > 25:

mpg$great_hwy[mpg$hwy > 35]  <-1
mpg$great_hwy[mpg$hwy <= 35] <-0
mpg$great_hwy <- factor(mpg$great_hwy)

mpg$great_cty[mpg$cty > 25]  <- 1
mpg$great_cty[mpg$cty <= 25] <- 0
mpg$great_cty <- factor(mpg$great_cty)

If we plot great_hwy vs. great_cty, it won't tell us much:

ggplot(mpg) + geom_point(aes(x=great_cty, y=great_hwy))

How could I make the data points bigger in size depending on the number of x/y points? Hope this clears it up, but let me know otherwise.

3
  • 1
    A small data sample would be very helpful here...you can choose one from ?datasets if you want. Commented Oct 2, 2009 at 19:56
  • 1
    I don't understand what you mean by "the count of my_factor for that prior/current combination." Is there more than one data point for each x/y? So you're looking for a solution to the overplotting issue? Or do you mean something else? Commented Oct 2, 2009 at 19:59
  • @Shane, I'm working on a better example as per your suggestion. @Harlan, there are many data points for each x/y. I would like to plot one data point for each x/y, and I want the size of said data point to be proportional to the number x/y pairs. Commented Oct 2, 2009 at 20:11

2 Answers 2

21

You can certainly do this by counting external to ggplot, but one of the great things about ggplot is that you can do many of these statistics internally!

Using your mpg example above:

ggplot(mpg) + 
  geom_point(aes(x=great_cty, y=great_hwy, 
                 size=..count..), stat="bin")

alt text

Sign up to request clarification or add additional context in comments.

4 Comments

Exactly what I was looking for. Looks like most cars are not great in terms of city and highway mileage ;)
You might also want to check out this page, just to make sure that the size of the points is what you think it is (radius? area?): had.co.nz/ggplot2/scale_size.html I think having proportional areas is traditionally preferred to proportional radii.
Yes, but ggplot2 doesn't do that because it only works for points - not (e.g.) lines or text. scale_area is strongly recommended for points!
I believe this now throws a warning in new versions of ggplot (I hope this won't break this approach in the future): Mapping a variable to y and also using stat="bin". With stat="bin", it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat="bin" and don't map a variable to y. If you want y to represent values in the data, use stat="identity". See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
0

Because the accepted answer uses a deprecated feature I'll point out this alternate answer that works for ggplot2 1.0.1

ggplot2 visualizing counts of points plotted on top of each other: stat_bin2d or geom_tile or point size?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.