I am trying to compute a new variable based on values of multiple existing variables with conditional computation. Specifically, the new variable is kidney function (eGFR), which is estimated based on one's sex, age, being (non-)Negroid, and the concentrations of two blood compounts (i.e., creatinine and cystatin C).
I have tried to accomplish this with R's if...else statement, but encountered a warning message, after which nothting happens. All variables are contained within data frame 'd'.
Basically, what I would like R to do is: if a subject is of the male sex (== 1) and non-Negroid (!= 1), has a blood creatinine ≤ 0.9 and cystatin C ≤ 0.8, then one's kidney function is estimated through:
and so forth. To this purpose I applied the following piece of code:
if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid males
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid females
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race != 1){
d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid males
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid females
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
} else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race == 1){
d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
}
However, when running this R yields:
Warning message:
In if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & :
the condition has length > 1 and only the first element will be used
Anyone who can help me out?
UPDATE: Below are some sample data, including age, sex (0=female, 1=male), race (1=Negroid, != 1 being non-Negroid), creatinine, cystatin C, and the manually calculated eGFR for formula verification purposes:
reconstruct <- structure(list(sex = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), race = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("0", "1", "2",
"3", "4"), class = "factor"), age = c(71.9425051334702, 65.1964407939767,
46.2258726899384, 51.7152635181383, 54.8747433264887, 71.6714579055441,
36.0793976728268, 54.3764544832307, 57.9110198494182, 49.9438740588638
), creatinine = c(0.633484162895928, 0.984162895927602, 0.769230769230769,
0.8710407239819, 0.769230769230769, 0.690045248868778, 0.893665158371041,
1.02941176470588, 0.83710407239819, 0.701357466063348), cystatinC = c(0.73,
0.85, 0.64, 0.9, 0.83, 0.95, 1.04, 1, 0.95, 0.68), eGFR = c(96.1605293085191,
73.17567750685, 105.934761135043, 80.8974371814808, 103.186483803272,
88.1306212690947, 77.7383905116244, 66.9892381719287, 90.7223944432609,
107.443909414004)), row.names = c(NA, 10L), class = "data.frame")
if([false true true false]), which does not make sense. So R cuts this to the first value. You need to do other logical operators. Try %in% or something like this, which returns just one logical value.any()around all your conditions, but this fixes your syntax. try to think about vectorized operations like ifelseifelse()function rather than theif ... elseconstruct -- but with so many conditions that would be neither efficient nor readable.d$sexandd$raceand develop simplified formulas for the resulting 4 factor levels (formulas that can be abstracted to their own function definitions) and then populate the new variable in a more readable way by using this factor and the new functions.dputformat? If yes,dit the question with the output ofdput(head(d, 20)).