4

I´m trying to simulate death over 7 years with the cumulative probability as follows:

tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))

cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)

How can I sample from tab$id without replacement in a vectorized fashion according to the cumulative probability in cum.prob ? The ids sampled from yr 1 can necessarily not be sampled again in yr 2. Hence the lapply(cum.prob,function(x) sample(tab$id,x*1000)) will not work. Is it possible to vectorize this?

//M

2 Answers 2

7

Here's one way: First get the probability of a given individual's dying in a given year as probYrDeath, i.e. probYrDeath[i] = Prob( individual dies in year i ), where i=1,2,...,7.

probYrDeath <- c(diff(c(0,cum.prob)).

Now generate a random sample of 1000 "Death Years", with replacement, from the sequence 1:8, according to the probabilities in probYrDeath, augmented by the probability of not dying by year 7:

set.seed(1) ## for reproducibility
tab$DeathYr <- sample( 8, 1000, replace = TRUE, 
                       prob = c(probYrDeath, 1-sum(probYrDeath)))

We interpret "'DeathYr = 8'" as "not dying within 7 years", and extract the subset of tab where DeathYr != 8:

tab_sample <- subset(tab, DeathYr != 8 )

You can verify that the cumulative proportions of deaths in each year approximate the values in cum.prob:

> cumsum(table(tab_sample$DeathYr)/1000)
    1     2     3     4     5     6     7 
0.045 0.071 0.080 0.094 0.105 0.115 0.124 
Sign up to request clarification or add additional context in comments.

4 Comments

Nice work. Very direct approach. Sidesteps all the complications I was adding.
Ooh. Way better than mine! I didn't realize you can pass probabilities to sample. And the diff is also groovy! You could set the 8 based on the length(cum.prob) + 1 though.
Very intuitive. Appreciate it. Thx. //M
You can also avoid the c(probYrDeath, 1-sum(probYrDeath)) instead of probYrDeath complication by using probYrDeath <- diff(c(0, cum.prob, 1)) in the earlier line
0

Does this work for you:

prob.death.per.year<-c(1-cum.prob[length(cum.prob)], cum.prob - c(0, cum.prob[-length(cum.prob)]))
dead.in.years<-as.vector(rmultinom(1, length(tab$id),prob.death.per.year))[-1]
totsamp<-sum(dead.in.years)
data.frame(id=sample(tab$id, totsamp), dead.after=rep(seq_along(dead.in.years), dead.in.years))

Depending upon which form you want the result in, you can change the last step.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.