1

I am using a package called BetaMixture in R to fit a mixture of beta distributions for a data vector. The output is supplied to a hist that produces a good histogram with the mixture model components:

# Install and load the libraries

#install.packages("BetaModels")
library(BetaModels)

# Create a vector, fit mixture models and plot the histogram

vec <- c(rbeta(700, 5, 2), rbeta(300, 1, 10))
model <- BetaMixture(vec,2)
h <- hist(model, breaks = 35)

enter image description here

So far so good. Now how do I get this in ggplot? I inspected the h object but that is not different from the model object. They are exactly the same. I don't know how this hist even works for this class. What does it pull from the model to generate this plot other than the @datavec?

0

2 Answers 2

3

You can get the hist function for BetaMixed objects using getMethod("hist", "BetaMixture").
Below you can find a simple translation of this function into the "ggplot2 world".

myhist <- function (x, ...) {
    .local <- function (x, mixcols = 1:7, breaks=25, ...) 
    {
        df1 <- data.frame(x=x@datavec)
        p <- ggplot(data=df1, aes(x=x)) + 
             geom_histogram(aes(y=..density..), bins=breaks, alpha=0.5, fill="gray50", color="black")
        while (length(mixcols) < ncol(x@mle)) mixcols <- c(mixcols, 
            mixcols)
        xv <- seq(0, 1, length = 502)[1:501]
        for (J in 1:ncol(x@mle)) {
            y <- x@phi[J] * dbeta(xv, x@mle[1, J], x@mle[2, J])
            df2 <- data.frame(xv, y)
            p <- p + geom_line(data=df2, aes(xv, y), size=1, col=mixcols[J])
        }
        p <- p + theme_bw()
        invisible(p)
    }
    .local(x, ...)
}

library(ggplot2)
# Now p is a ggplot2 object.
p <- myhist(model, breaks=35)
print(p)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

This is beautiful, thanks!
Also, your explanation using the getMethod to read local functions is an eye-opener. Thanks!
2

The object returned by BetaMixture is a S4 class object with 2 slots that are of interest.

Slot Z returns a matrix of the probabilities of each data point belonging to each of the distributions.
So in the first 6 rows, all points belong to the 2nd distribution.

head(model@Z)
#             [,1]      [,2]
#[1,] 1.354527e-04 0.9998645
#[2,] 4.463074e-03 0.9955369
#[3,] 1.551999e-03 0.9984480
#[4,] 1.642579e-03 0.9983574
#[5,] 1.437047e-09 1.0000000
#[6,] 9.911427e-04 0.9990089

And slot mle return the maximum likelihood estimates of the parameters.

Now use those values to create a data.frame of the vector and a data.frame of the parameters.

df1 <- data.frame(vec)
df1$component <- factor(apply(model@Z, 1, which.max))
colors <- as.integer(levels(df1$component))

params <- as.data.frame(t(model@mle))
names(params) <- c("shape1", "shape2")

Plot the data.

library(ggplot2)

g <- ggplot(df1, aes(x = vec, group = component)) +
  geom_histogram(aes(y = ..density..),
                 bins = 35, fill = "grey", color = "grey40")

for(i in 1:nrow(params)){
  sh1 <- params$shape1[i]
  sh2 <- params$shape2[i]
  g <- g + stat_function(
    fun = dbeta,
    args = list(shape1 = sh1, shape2 = sh2),
    color = colors[i]
  )
}
suppressWarnings(print(g + theme_bw()))

enter image description here

3 Comments

Thank you so much for the detailed explanation, especially the Z slot. I was thinking they are the actual component values than the probabilities. I didn't know that there was a local hist method specifically developed for this package. It's a very good package, I should say! Also, your solution using the probabilities is very nice.
However, for the general fitting problem, you need to use the weights parameters as well, which is given by @phi in the model. Otherwise, the components will blow up.
@user2167741 Yes, that's true, I considered using the weights. But in this case it works without them so I chose not to complicate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.