4

I want to plot the loglikelihood function of a series of independent Bernoulli distributed random variables y with parameter p being a function (the logistic function) of some feature x. This logistic function also has a parameter b. This is the parameter I want to estimate. So I want to plot the loglikelihood as a function of b. I want to do so in R using ggplot2, because I want to become better in those.

My creation of the loglikelihood function can and should be done better, but that is not my point here. The problem is that the plotted loglikelihood is constant over the interval (-5,5). That seems wrong. Especially, because when I call the function with some arbitrary b in that interval it returns a different value. Why does this happen? Thank you.

library(ggplot2)
set.seed(123)

# parameters 
n=100
mu=0
s=2
b<-0.2

# functions 
logit <- function(x,b){1/(1+exp(-b*x))}

# simulation of data
x<-rnorm(n,mu,s)
y_prob<-logit(x,b)
y<-rbinom(n,1,y_prob)
df<-data.frame(x,y)

# loglikelihood function 
loglikelihood<-function(b,df){
  prd<-1
  for (i in 1:NROW(df)){
    events<-logit(df$x[i],b)
    nonevents<-1-events
    prd<-prd*events^df$y[i]*nonevents^(1-df$y[i])
  }
  return(sum(log(prd)))
}


loglikelihood(0.3,df)

p2<-ggplot(data=data.frame(b=c(-5,5)), aes(b)) + stat_function(fun=loglikelihood, args=list(df=df))
p2<-p2+xlab("b") + ylab("loglikelihood")
p2

1 Answer 1

4

The problem is your loglikelihood function. You must pass a "vectorized" funciton to stat_function. Most functions in R will return a vector if you pass in a vector. For example sin(1:10) will return the sine of the numbers 1 through 10. However, when a vector of values is passed to your function, only one value is returned

loglikelihood(seq(-5,5, by=.1), df)
# [1] -20534.44

Since it doesn't behave like a "normal" R function, you have this problem. The easiest way to fix this is to wrap your function definition in a Vectorize command. Observe

vloglikelihood <- Vectorize(loglikelihood, vectorize.args="b")
vloglikelihood(seq(-5,5, by=.1), df)
# [1] -463.67919 -454.67142 -445.66980 -436.67470 -427.68654 -418.70574 ...

Now it vloglikelihood behaves like a good R function should. Then we can plot it as you were doing

ggplot(data=data.frame(b=c(-5,5)), aes(b)) + 
    stat_function(fun=vloglikelihood, args=list(df=df)) +
    xlab("b") + ylab("loglikelihood")

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.