Create a new variable using some rules in R

Question

dat1 below contain three variables. There are three unique IDs and each has multible records.

ID <- c(rep(1,7), rep(2,6), rep(3,5))
t <- c(seq(1,7), seq(1,6), seq(1,5))
y <- c(rep(6,7), rep(1,6), rep(6,5))
z <- c(6,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,6,NA,NA,NA,NA)
randn <- rnorm(18,0,1)
dat1 <- data.frame(ID, t, y, z, randn)

Notice that for each ID the value of z is non-missing when t is minimum (the first row for each ID).

I need to create a new column called NewX. Note that each cell in the data frame can be expressed as Cell(i,j), where i is the number of ID and j is the number of record. For example, z(1,1) = 6; z(2,1) = 1, and so on.

Case 1: if j = 1 (i.e., the first record for each ID):

NewX(i,1) = rnorm(n=1, mean = z(i,1)*randn(i,1), sd=1)
Case 2: if j > 1 (i.e., not first record for each ID)

NewX(i,j) = rnorm(n=1, mean = randn(i,j)*z(i,j), sd=1), where z(i,j) = z(i,j-1) - NewX(i,j-1)

Your notation is confusing. You should (i) use set.seed to create a reproducible example and (ii) explicitly show the output you expect for that example (in addition to the words you have describing the rules behind it). — Frank
– Frank, Commented Aug 18, 2017 at 20:59
If NewX is a column, how could it have rows and columns? Did you mean it's a new table? — lebelinoz
– lebelinoz, Commented Aug 18, 2017 at 22:08
OP is abusing notation, using "i" to indicate the group (ID) and "j" to indicate the row. Ignore all the i,s and instead think within each ID... Also, in OP's example, j and t are equivalent. — Gregor Thomas
– Gregor Thomas, Commented Aug 18, 2017 at 22:09

Gregor Thomas · Accepted Answer · 2017-08-18 22:22:37Z

I think this works how you want. Rather than doing lots of individual rnorm calls with different means I do a single rnorm with mean 0 and add the mean-adjustment to each.

First a function to do what you want to a single group:

foo = function(dat) {
    NewX = rnorm(nrow(dat))
    NewX[1] = NewX[1] + dat$z[1] * dat$randn[1]
    for (i in 2:nrow(dat)) {
        dat$z[i] = dat$z[i - 1] - NewX[i - 1]
        NewX[i] = NewX[i] + dat$randn[i] * dat$z[i]
    }
    dat$NewX = NewX
    return(dat)
}

Apply the function to each group:

# using base:
do.call(rbind, args = lapply(split(dat1, dat1$ID), foo))

# using dplyr
library(dplyr)
group_by(dat1, ID) %>% do(foo(.))
# # A tibble: 18 x 6
# # Groups:   ID [3]
#       ID     t     y          z      randn       NewX
#    <dbl> <int> <dbl>      <dbl>      <dbl>      <dbl>
#  1     1     1     6  6.0000000  0.9613432  7.4952847
#  2     1     2     6 -1.4952847 -1.3119847  1.8228137
#  3     1     3     6 -3.3180984  0.4025080 -1.2172146
#  4     1     4     6 -2.1008838 -1.8188487  5.8479404
#  5     1     5     6 -7.9488242  0.6298387 -3.5717586
#  6     1     6     6 -4.3770656 -0.6872249  3.2324739
#  7     1     7     6 -7.6095394 -0.5542710  2.8111069
#  8     2     1     1  1.0000000 -0.1773999 -0.7477932
#  9     2     2     1  1.7477932 -1.8299770 -3.1449473
# 10     2     3     1  4.8927405  0.2852126  1.8376771
# 11     2     4     1  3.0550633 -0.5352681 -2.4578430
# 12     2     5     1  5.5129063 -0.6147433 -3.3131580
# 13     2     6     1  8.8260643 -0.3065883  0.3074687
# 14     3     1     6  6.0000000  1.6159438 10.4165718
# 15     3     2     6 -4.4165718  1.1954419 -6.0555754
# 16     3     3     6  1.6390036 -1.1659655 -4.3974029
# 17     3     4     6  6.0364065  0.9377918  6.3873113
# 18     3     5     6 -0.3509048 -1.1887718  0.6909987

Collectives™ on Stack Overflow

Create a new variable using some rules in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related