0

I have the following skewed data:

set.seed(3)
x <- rgamma(1e6, 0.1, .2)

summary(log(x))
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# -170.637  -12.760   -5.825   -8.828   -1.745    3.807 

Looked at the log-transformed distribution of the data

summary(log(x))
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# -170.637  -12.760   -5.825   -8.828   -1.745    3.807 

Visualizing the data with transformation:

ggplot(data.frame(x), aes(x)) + 
  geom_histogram(bins = 100) + 
  scale_x_continuous(trans = "log")

enter image description here

What is the reason for the difference in the log transformation and scaling in ggplot? I see there are differences by having a look at x-axis. The minimum value in the summary is -170.637 while the plot has values in the range 5.8e-62.

update:

g1 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100)
g2 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100) + scale_x_continuous(trans = "log")
g3 <- ggplot(data.frame(x), aes(log(x))) + geom_histogram(bins = 100)
gridExtra::grid.arrange(g1, g2, g3, ncol=3)

enter image description here

g1 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100)
g2 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100) + scale_x_log10()
g3 <- ggplot(data.frame(x), aes(log10(x))) + geom_histogram(bins = 100)
gridExtra::grid.arrange(g1, g2, g3, ncol=3)

enter image description here

1
  • FYI, the default breaks of the log transformation is done by the scales::log_breaks function, which I found informative to look at. Commented Dec 18, 2017 at 16:02

1 Answer 1

3

It may be easier to see if you instead use scale_x_log10

ggplot(data.frame(x), aes(x)) + 
  geom_histogram(bins = 100) + 
  scale_x_log10()

gives

enter image description here

Then, we can do a few things to compare. First, we can change the labels:

myBreaks <-
  10^c(-61, -43, -25, -7)

ggplot(data.frame(x), aes(x)) + 
  geom_histogram(bins = 100) + 
  scale_x_log10(breaks = myBreaks
                , labels = log10(myBreaks))

gives

enter image description here

We can also get the same plot by transforming x before plotting it:

ggplot(data.frame(x = log10(x)), aes(x)) + 
  geom_histogram(bins = 100)

gives

enter image description here

and, we can compare all of these to the summary for the log10(x)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-74.1065  -5.5416  -2.5300  -3.8340  -0.7579   1.6531 

See how that matches up with the graphs above pretty closely?

scale_x_log10 and scale_x_continuous(trans = "log") are not actually changing the data -- they are changing the scaling of the axis, but leaving the labels in the original units.

Bringing it back to your original values, log(5.8e-62) is -141 -- which is the value you would expect to see if the plot was of the converted data.

If you really must have the log-values displayed, you can also accomplish that within the mapping, with the added advantage that the axis-label defaults to a meaningful value as well:

ggplot(data.frame(x = x), aes(log10(x))) + 
  geom_histogram(bins = 100)

gives

enter image description here

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks (+1) for "..are not actually changing the data". Could you comment on how to convert x other than log(x) and plot the labels automatically based on the data?
The simplest response is "Why?" In general, it will be easier to read your graphs if they are plotted in the measured units, rather than log-transformed (this is especially true for log-base-e, which is non-trivial to convert in your head, as evidenced by the fact that you didn't immediately recognize that 5.8e-62 was the same as e^-141). If there is a good reason to convert, convert before plotting (e.g., with log(x) as in the last two plots in my edited answer).
@Prradep, see the edit for an alternative way to accomplish the transformation before plotting.
Thanks..but I have to plot with original data and then plot the transformed data. So ggplot(data.frame(x = x), aes(log10(x))) does not work for me.
The question, again, is why you are unable to change this (and why you are unhappy with labeling in the original units). However, you can change the default data of a ggplot object using the %+% operator, e.g. g1 %+% data.frame(x = log10(x)) should do what you want here.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.