0

I am working with a dataset of about 2000 rows and 22 columns. I need to assign a score of 0-4 into a new column, based on the contents of two others. Here's the example:

SAMP_SMPT_USER_REFERENCE MEAS_ANAL_METH_CODE oP.Res TP.Res OP_METH TP_METH
0031 21 0.02 0.04
0032 24 0.12 0.003

(with apologies for how the database generated that heading)

I am trying to build a loop that looks at MEAS_ANAL_METH_CODE and oP.Res. First it will see if MEAS_ANAL_METH_CODE is 21 or 24, then based on that answer it will look at the value in oP.Res and add a category (0,1,2,3,4) to the OP_METH column. Then it will so it again for TP.Res, looking at MEAS_ANAL_METH_CODE, TP.Res and assigning a value to TP_Meth.

I'm new at loops!

This is what I have tried:


for(oP.Res in Merged){
  
  if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res < 0.01) {
    "0"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
    "1"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
    "2"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
    "3"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.1) {
    "4"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res < 0.01) {
    "4"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
    "3"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
    "2"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
    "1"
  } else {"0"} 
}

I can see there is probably an issue about R not knowing where to put the outputs, but I cannot figure it out.

3
  • 1
    Consider using cut to break oP.Res into 4 categories first. Commented Jun 17, 2022 at 8:10
  • you can try with something like: your_dataframe %>% mutate(your_new_column_name = case_when(each_condition_that_defines_your_desired_output_value ~ your_desired_output_value)) Commented Jun 17, 2022 at 8:11
  • Note that (in R) looping over rows happens automatically when working with dataframe columns. Writing a loop will, at best, unnecessarily complicate your code. Commented Jun 17, 2022 at 8:20

2 Answers 2

1

A solution using loops and a lookup list.

First store the cut breaks and labels for each code in a list.

tmp=list(
  "21"=list(
    "brk"=c(0,0.01,0.0375,0.0725,0.1,1),
    "lab"=0:4
  ),
  "24"=list(
    "brk"=c(0,0.01,0.0375,0.0725,0.1,1),
    "lab"=4:0
  )
)

Then loop over the columns of interest and for each code apply the cut function.

for(cc in c("oP.Res","TP.Res")) {
  Merged[paste0(cc,"_cut")]=NA
  for (ctg in unique(Merged$MEAS_ANAL_METH_CODE)) {
    Merged[Merged$MEAS_ANAL_METH_CODE==ctg,paste0(cc,"_cut")]=
      as.character(
        cut(
          Merged[Merged$MEAS_ANAL_METH_CODE==ctg,cc],
          tmp[[as.character(ctg)]][["brk"]],
          tmp[[as.character(ctg)]][["lab"]]
        )
      )
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Worked like a charm! I spent literally hours yesterday trying to figure this out, hero.
0

another option, taking advantage of the readability of {dplyr} pipes:

given the example dataframe merged ...

merged <- structure(list(SAMP_SMPT_USER_REFERENCE = 31:32, MEAS_ANAL_METH_CODE = c(21L, 
24L), oP.Res = c(0.02, 0.12), TP.Res = c(0.04, 0.003), OP_METH = c(NA, 
NA), TP_METH = c(NA, NA)), class = "data.frame", row.names = 1:2)


## > merged
##   SAMP_SMPT_USER_REFERENCE MEAS_ANAL_METH_CODE oP.Res TP.Res OP_METH TP_METH
## 1                       31                  21   0.02  0.040      NA      NA
## 2                       32                  24   0.12  0.003      NA      NA

... you can apply the same classification rule across multiple columns (and obtain specifically named columns) like so:

library(dplyr)

merged %>%
  mutate(across(c(oP.Res, TP.Res),
                .fns = function(result) {
                  cat_borders <- c(10, 375, 725, 1000, Inf) / 1000
                  cat = findInterval(result, cat_borders)
                  ## category borders for method 21 are the reverse
                  ## of those for method 24, we can mirror by subtracting from 5:
                  ifelse(MEAS_ANAL_METH_CODE == 24, cat, 5 - cat)
                },
                .names = "OP_METH_CAT_{.col}"
                )
         )

... result:

##   SAMP_SMPT_USER_REFERENCE MEAS_ANAL_METH_CODE oP.Res TP.Res OP_METH TP_METH
## 1                       31                  21   0.02  0.040      NA      NA
## 2                       32                  24   0.12  0.003      NA      NA
##   OP_METH_CAT_oP.Res OP_METH_CAT_TP.Res
## 1                  4                  4
## 2                  1                  0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.