0

dat1 below contain three variables. There are three unique IDs and each has multible records.

ID <- c(rep(1,7), rep(2,6), rep(3,5))
t <- c(seq(1,7), seq(1,6), seq(1,5))
y <- c(rep(6,7), rep(1,6), rep(6,5))
z <- c(6,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,6,NA,NA,NA,NA)
randn <- rnorm(18,0,1)
dat1 <- data.frame(ID, t, y, z, randn)

Notice that for each ID the value of z is non-missing when t is minimum (the first row for each ID).

I need to create a new column called NewX. Note that each cell in the data frame can be expressed as Cell(i,j), where i is the number of ID and j is the number of record. For example, z(1,1) = 6; z(2,1) = 1, and so on.

  • Case 1: if j = 1 (i.e., the first record for each ID):

    NewX(i,1) = rnorm(n=1, mean = z(i,1)*randn(i,1), sd=1)

  • Case 2: if j > 1 (i.e., not first record for each ID)

    NewX(i,j) = rnorm(n=1, mean = randn(i,j)*z(i,j), sd=1), where z(i,j) = z(i,j-1) - NewX(i,j-1)

3
  • 3
    Your notation is confusing. You should (i) use set.seed to create a reproducible example and (ii) explicitly show the output you expect for that example (in addition to the words you have describing the rules behind it). Commented Aug 18, 2017 at 20:59
  • If NewX is a column, how could it have rows and columns? Did you mean it's a new table? Commented Aug 18, 2017 at 22:08
  • OP is abusing notation, using "i" to indicate the group (ID) and "j" to indicate the row. Ignore all the i,s and instead think within each ID... Also, in OP's example, j and t are equivalent. Commented Aug 18, 2017 at 22:09

1 Answer 1

2

I think this works how you want. Rather than doing lots of individual rnorm calls with different means I do a single rnorm with mean 0 and add the mean-adjustment to each.

First a function to do what you want to a single group:

foo = function(dat) {
    NewX = rnorm(nrow(dat))
    NewX[1] = NewX[1] + dat$z[1] * dat$randn[1]
    for (i in 2:nrow(dat)) {
        dat$z[i] = dat$z[i - 1] - NewX[i - 1]
        NewX[i] = NewX[i] + dat$randn[i] * dat$z[i]
    }
    dat$NewX = NewX
    return(dat)
}

Apply the function to each group:

# using base:
do.call(rbind, args = lapply(split(dat1, dat1$ID), foo))

# using dplyr
library(dplyr)
group_by(dat1, ID) %>% do(foo(.))
# # A tibble: 18 x 6
# # Groups:   ID [3]
#       ID     t     y          z      randn       NewX
#    <dbl> <int> <dbl>      <dbl>      <dbl>      <dbl>
#  1     1     1     6  6.0000000  0.9613432  7.4952847
#  2     1     2     6 -1.4952847 -1.3119847  1.8228137
#  3     1     3     6 -3.3180984  0.4025080 -1.2172146
#  4     1     4     6 -2.1008838 -1.8188487  5.8479404
#  5     1     5     6 -7.9488242  0.6298387 -3.5717586
#  6     1     6     6 -4.3770656 -0.6872249  3.2324739
#  7     1     7     6 -7.6095394 -0.5542710  2.8111069
#  8     2     1     1  1.0000000 -0.1773999 -0.7477932
#  9     2     2     1  1.7477932 -1.8299770 -3.1449473
# 10     2     3     1  4.8927405  0.2852126  1.8376771
# 11     2     4     1  3.0550633 -0.5352681 -2.4578430
# 12     2     5     1  5.5129063 -0.6147433 -3.3131580
# 13     2     6     1  8.8260643 -0.3065883  0.3074687
# 14     3     1     6  6.0000000  1.6159438 10.4165718
# 15     3     2     6 -4.4165718  1.1954419 -6.0555754
# 16     3     3     6  1.6390036 -1.1659655 -4.3974029
# 17     3     4     6  6.0364065  0.9377918  6.3873113
# 18     3     5     6 -0.3509048 -1.1887718  0.6909987
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.