0

I am trying to compute a new variable based on values of multiple existing variables with conditional computation. Specifically, the new variable is kidney function (eGFR), which is estimated based on one's sex, age, being (non-)Negroid, and the concentrations of two blood compounts (i.e., creatinine and cystatin C).

I have tried to accomplish this with R's if...else statement, but encountered a warning message, after which nothting happens. All variables are contained within data frame 'd'.

Basically, what I would like R to do is: if a subject is of the male sex (== 1) and non-Negroid (!= 1), has a blood creatinine ≤ 0.9 and cystatin C ≤ 0.8, then one's kidney function is estimated through:

https://latex.codecogs.com/png.latex?\bg_white&space;eGFR=135\cdot\left&space;(&space;\frac{creatinine}{0.9}&space;\right&space;)^{-0.207}\cdot\left&space;(&space;\frac{cystatinC}{0.8}&space;\right&space;)^{-0.375}\cdot0.995^{age}

and so forth. To this purpose I applied the following piece of code:

if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid males
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 135 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race != 1){ ### Non-Negroid females
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race != 1){
    d$eGFR <- 130 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid males
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.207) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC <= 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 1 & d$creatinine > 0.9 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 145.8 * I((d$creatinine / 0.9)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC <= 0.8 & d$race == 1){ ### Negroid females
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine <= 0.7 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.248) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC <= 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.375) * I((0.995)^d$age)
  } else if (d$sex == 0 & d$creatinine > 0.7 & d$cystatinC > 0.8 & d$race == 1){
    d$eGFR <- 140.4 * I((d$creatinine / 0.7)^-0.601) * I((d$cystatinC / 0.8)^-0.711) * I((0.995)^d$age)
  }

However, when running this R yields:

Warning message:
In if (d$sex == 1 & d$creatinine <= 0.9 & d$cystatinC <= 0.8 &  :
  the condition has length > 1 and only the first element will be used

Anyone who can help me out?

UPDATE: Below are some sample data, including age, sex (0=female, 1=male), race (1=Negroid, != 1 being non-Negroid), creatinine, cystatin C, and the manually calculated eGFR for formula verification purposes:

reconstruct <- structure(list(sex = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 
2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), race = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("0", "1", "2", 
"3", "4"), class = "factor"), age = c(71.9425051334702, 65.1964407939767, 
46.2258726899384, 51.7152635181383, 54.8747433264887, 71.6714579055441, 
36.0793976728268, 54.3764544832307, 57.9110198494182, 49.9438740588638
), creatinine = c(0.633484162895928, 0.984162895927602, 0.769230769230769, 
0.8710407239819, 0.769230769230769, 0.690045248868778, 0.893665158371041, 
1.02941176470588, 0.83710407239819, 0.701357466063348), cystatinC = c(0.73, 
0.85, 0.64, 0.9, 0.83, 0.95, 1.04, 1, 0.95, 0.68), eGFR =     c(96.1605293085191, 
73.17567750685, 105.934761135043, 80.8974371814808, 103.186483803272, 
88.1306212690947, 77.7383905116244, 66.9892381719287, 90.7223944432609, 
107.443909414004)), row.names = c(NA, 10L), class = "data.frame")
9
  • 1
    you are comparing a whole vector with one value. this returns a vector like [false true true false]. At the end you ask if([false true true false]), which does not make sense. So R cuts this to the first value. You need to do other logical operators. Try %in% or something like this, which returns just one logical value. Commented Jul 6, 2019 at 12:21
  • you could also try use any() around all your conditions, but this fixes your syntax. try to think about vectorized operations like ifelse Commented Jul 6, 2019 at 12:29
  • 2
    You could use the ifelse() function rather than the if ... else construct -- but with so many conditions that would be neither efficient nor readable. Commented Jul 6, 2019 at 12:29
  • Perhaps you could first create a new factor variable based on d$sex and d$race and develop simplified formulas for the resulting 4 factor levels (formulas that can be abstracted to their own function definitions) and then populate the new variable in a more readable way by using this factor and the new functions. Commented Jul 6, 2019 at 12:45
  • Can you post sample data, preferably covering all cases in dput format? If yes,dit the question with the output of dput(head(d, 20)). Commented Jul 6, 2019 at 12:56

1 Answer 1

2

I believe the function below follows what is defined in the question but it is untested, since there are no data and expected output.

eGFRfun <- function(DF){
  i_sex <- DF[["sex"]] == 1
  i_creat_0.9 <- DF[["creatinine"]] <= 0.9
  i_creat_0.7 <- DF[["creatinine"]] <= 0.7
  i_cyst <- DF[["cystatinC"]] <= 0.8
  i_race <- DF[["race"]] == 1

  const_fac <- ifelse(i_race, 135, 145.8) + 5*(i_sex - 1)
  creat_denom <- ifelse(i_sex, 0.9, 0.7)
  creat_pow <- ifelse(i_sex & i_creat_0.9, -0.207, -0.601)
  creat_pow <- ifelse(i_sex & i_creat_0.7, -0.248, -0.601)
  cystC_fac <- (DF[["cystatinC"]] / 0.8)^ifelse(i_cyst, -0.375, -0.711)
  age_fac <- 0.995^DF[["age"]]

  const_fac * (DF[["creatinine"]] / creat_denom)^creat_pow * cystC_fac * age_fac
}

Example usage:

d$eGFR <- eGFRfun(d)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.