4

I'm trying to plot a histogram with 2 sets of data using ggplot2. My dataset has 418 values to plot, with 2 groups of data (so there will be 2 sets of coloured bars on my histogram). Annoyingly I can't reproduce the problem with the iris dataset:

library(ggplot2)
ggplot(iris, aes(x=iris[,1], fill=iris[,5])) + 
geom_histogram(binwidth=.5,alpha=.5)

This creates a histogram fine. When I try it on my data I get:

Error : cannot allocate vector of size 4.0 Gb
In addition: Warning messages:
1: In anyDuplicated.default(breaks) :
  Reached total allocation of 16366Mb: see help(memory.size)
2: In anyDuplicated.default(breaks) :
  Reached total allocation of 16366Mb: see help(memory.size)
3: In anyDuplicated.default(breaks) :
  Reached total allocation of 16366Mb: see help(memory.size)
4: In anyDuplicated.default(breaks) :
  Reached total allocation of 16366Mb: see help(memory.size)
Error in UseMethod("scale_dimension") : 
  no applicable method for 'scale_dimension' applied to an object of class "NULL"

I have 16GB of memory, so producing a plot with 418 data points shouldn't be an issue.

Any help much appreciated.


It turns out that my data still won't plot, even when referring to column names. I think this is due to the range in the data. After log transforming the data, the histogram plots. It seems that ggplot2 or R as a whole doesn't like a range of 1-165476109 which is understandable...

0

1 Answer 1

5

Your code should look like this:

ggplot(iris, aes(x=Sepal.Length, fill=Species)) + 
  geom_histogram(binwidth=.5,alpha=.5)

enter image description here

The reason is that arguments inside the aes() are evaluated in the environment of your data. This means your mapping should point to column names in your data, i.e. x=Sepal.Length).

When you write the aes() call in the way you did, you are trying to tell ggplot to map 150 different variables to x, and similarly to map 150 different variables to fill - this clearly isn't what you had in mind.

Sign up to request clarification or add additional context in comments.

1 Comment

Hello, thanks for your pointers to improve my code. It turns out that my data still won't plot, even when referring to column names. I think this is due to the range in the data. After log transforming the data, the histogram plots. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.