1

I've run into a problem trying to access other columns/variables when setting a post-processing rule using the mice package in R.

The simplified data I have are structured as:

participant_id

date

lab_measurement

covariate_1

covariate_2

Each participant has multiple measurements (but all on different dates). I've got the data in long format, so each participant has multiple rows, with each representing a different date. Naturally, some measurements are missing, and I'm using MICE to impute only the missing values in lab_measurement.

The difficulty is that, within in each patient, the measurements are correlated with the values at the previous date. To account for this, I've created an additional column, "previous_lab", that is based on the values in lab_measurement. I then specify the regression model used for imputation of lab_measurement as:

lab_measurement ~ previous_lab + covariate_1 + covariate_2

I would like previous_lab to update after each iteration in the imputation algorithm. The obvious issue is that the first lab_measurement cannot have a previous_lab, but it is safe to assume that everyone comes in with a pre-study measurement of 100.

To accomplish this, I tried the following:

imp <- mice(data, maxit = 0)

imp$post["previous_lab"] <- "imp[[j]][, 'previous_lab'] <- ave(imp[[j]][, 'lab_measurement'], imp[[j]][, 'participant_id'], FUN = function(x) {c(100, x[-length(x)])})"

predictor_matrix <- matrix(0, nrow=ncol(data), ncol=ncol(data))
rownames(predictor_matrix) <- colnames(data)
colnames(predictor_matrix) <- colnames(data)
predictor_matrix["lab_measurement", c("previous_lab", "covariate_1", "covariate_2")] <- 1

imp <- mice(data, m = 5, predictorMatrix = predictor_matrix, post = imp$post, maxit = 10, seed = 123)

Unfortunately, R spits out the following error:

Error in [.data.frame(imp[[j]], , "participant_id") : undefined columns selected

A couple of things that I've done to trouble-shoot:

I've asked the post-processing rule to print the column names, by doing the following:

imp$post["previous_lab"] <- "
print(colnames(imp[[j]]))
imp[[j]][, 'previous_lab'] <- ave(imp[[j]][, 'lab_measurement'], imp[[j]][, 'participant_id'], FUN = function(x) {c(100, x[-length(x)])})"`

And then I get the following output:

iter imp variable

1 1 lab_measurement previous_lab[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"

Error in [.data.frame(imp[[j]], , "participant_id") :

undefined columns selected

(sorry, I can't seem to format the above neatly)

From this it seems that R has converted the column names to numbers, but then the numbers should go up to 5, not 10. When I change the number of imputations, the print-out of column names/numbers goes up to the number of imputations requested.

I've double-checked that participant_id has been converted from a String to a factor variable.

Last, I've had a look at the vignette written by Gerko Vink and Stef van Buuren (https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html). Here they applied post-processing by accessing a variable using the undefined index i:

post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"

I'm not sure how exactly R knows that "i" needs to get to the "tv" variable.

It looks like post-processing can only be applied "within" a variable, and one cannot access other variables during post-processing. Would be greatly appreciated if anyone has grappled with this and knows a solution!

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.