I have a dataset consisting of 2 continuous variables X1, X2 with missing values in both, and I need to impute the missing data. I am working with the MICE package in R. The trouble is that the values in one column are conditional on the other, specifically X1 >= X2. However, when I run mice, values are imputed that violate this condition.
Here is a minimal working example:
library(MASS)
library(tidyverse)
library(mice)
p1 <- 0.7
p2 <- 0.65
sample_size <- 100
sample_meanvector <- c(5, 5)
sample_covariance_matrix <- matrix(c(10, 5, 2, 9), ncol = 2)
mvrnorm(
n = sample_size,
mu = sample_meanvector,
Sigma = sample_covariance_matrix) %>%
data.frame() %>%
as_tibble() %>%
mutate(R1 = rbinom(sample_size, 1, p1)) %>%
mutate(R2 = rbinom(sample_size, 1, p2)) %>%
mutate(X1 = ifelse(R1 == 1, X1, NA)) %>%
mutate(X2 = ifelse(R2 == 1, X2, NA)) %>%
dplyr::select(X1, X2) %>%
filter(X1 >= X2 | is.na(X1) | is.na(X2)) -> sample_data
sample_data %>%
ggplot(aes(x=X1,y=X2)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = 'red')
mice(sample_data, m=1) -> mids
complete(mids, 1) -> imputed_data
imputed_data %>%
ggplot(aes(x=X1,y=X2)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = 'red')
I understand that I need to use the post feature somehow but I cannot find detailed enough documentation on this feature, specifically to help in the situation where the imputed values are constrained by other imputed values in the same dataset. Please help.