I have the following skewed data:
set.seed(3)
x <- rgamma(1e6, 0.1, .2)
summary(log(x))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -170.637 -12.760 -5.825 -8.828 -1.745 3.807
Looked at the log-transformed distribution of the data
summary(log(x))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -170.637 -12.760 -5.825 -8.828 -1.745 3.807
Visualizing the data with transformation:
ggplot(data.frame(x), aes(x)) +
geom_histogram(bins = 100) +
scale_x_continuous(trans = "log")
What is the reason for the difference in the log transformation and scaling in ggplot? I see there are differences by having a look at x-axis. The minimum value in the summary is -170.637 while the plot has values in the range 5.8e-62.
update:
g1 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100)
g2 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100) + scale_x_continuous(trans = "log")
g3 <- ggplot(data.frame(x), aes(log(x))) + geom_histogram(bins = 100)
gridExtra::grid.arrange(g1, g2, g3, ncol=3)
g1 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100)
g2 <- ggplot(data.frame(x), aes(x)) + geom_histogram(bins = 100) + scale_x_log10()
g3 <- ggplot(data.frame(x), aes(log10(x))) + geom_histogram(bins = 100)
gridExtra::grid.arrange(g1, g2, g3, ncol=3)







scales::log_breaksfunction, which I found informative to look at.