Using an if else loop to populate a dataframe in R

Question

I am working with a dataset of about 2000 rows and 22 columns. I need to assign a score of 0-4 into a new column, based on the contents of two others. Here's the example:

SAMP_SMPT_USER_REFERENCE	MEAS_ANAL_METH_CODE	oP.Res	TP.Res	OP_METH	TP_METH
0031	21	0.02	0.04
0032	24	0.12	0.003

(with apologies for how the database generated that heading)

I am trying to build a loop that looks at MEAS_ANAL_METH_CODE and oP.Res. First it will see if MEAS_ANAL_METH_CODE is 21 or 24, then based on that answer it will look at the value in oP.Res and add a category (0,1,2,3,4) to the OP_METH column. Then it will so it again for TP.Res, looking at MEAS_ANAL_METH_CODE, TP.Res and assigning a value to TP_Meth.

I'm new at loops!

This is what I have tried:


for(oP.Res in Merged){
  
  if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res < 0.01) {
    "0"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
    "1"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
    "2"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
    "3"
  } else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.1) {
    "4"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res < 0.01) {
    "4"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
    "3"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
    "2"
  } else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
    "1"
  } else {"0"} 
}

I can see there is probably an issue about R not knowing where to put the outputs, but I cannot figure it out.

you can try with something like: your_dataframe %>% mutate(your_new_column_name = case_when(each_condition_that_defines_your_desired_output_value ~ your_desired_output_value)) — bpvalderrama
– bpvalderrama, Commented Jun 17, 2022 at 8:11
Note that (in R) looping over rows happens automatically when working with dataframe columns. Writing a loop will, at best, unnecessarily complicate your code. — user18309711
– user18309711, Commented Jun 17, 2022 at 8:20

user2974951 · Accepted Answer · 2022-06-17 08:36:19Z

1

A solution using loops and a lookup list.

First store the cut breaks and labels for each code in a list.

tmp=list(
  "21"=list(
    "brk"=c(0,0.01,0.0375,0.0725,0.1,1),
    "lab"=0:4
  ),
  "24"=list(
    "brk"=c(0,0.01,0.0375,0.0725,0.1,1),
    "lab"=4:0
  )
)

Then loop over the columns of interest and for each code apply the cut function.

for(cc in c("oP.Res","TP.Res")) {
  Merged[paste0(cc,"_cut")]=NA
  for (ctg in unique(Merged$MEAS_ANAL_METH_CODE)) {
    Merged[Merged$MEAS_ANAL_METH_CODE==ctg,paste0(cc,"_cut")]=
      as.character(
        cut(
          Merged[Merged$MEAS_ANAL_METH_CODE==ctg,cc],
          tmp[[as.character(ctg)]][["brk"]],
          tmp[[as.character(ctg)]][["lab"]]
        )
      )
  }
}

answered Jun 17, 2022 at 8:36

user2974951

10.4k2 gold badges21 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dani.Elizabeth Over a year ago

Worked like a charm! I spent literally hours yesterday trying to figure this out, hero.

user18309711 · Accepted Answer · 2022-06-17 13:34:37Z

another option, taking advantage of the readability of {dplyr} pipes:

given the example dataframe merged ...

merged <- structure(list(SAMP_SMPT_USER_REFERENCE = 31:32, MEAS_ANAL_METH_CODE = c(21L, 
24L), oP.Res = c(0.02, 0.12), TP.Res = c(0.04, 0.003), OP_METH = c(NA, 
NA), TP_METH = c(NA, NA)), class = "data.frame", row.names = 1:2)


## > merged
##   SAMP_SMPT_USER_REFERENCE MEAS_ANAL_METH_CODE oP.Res TP.Res OP_METH TP_METH
## 1                       31                  21   0.02  0.040      NA      NA
## 2                       32                  24   0.12  0.003      NA      NA

... you can apply the same classification rule across multiple columns (and obtain specifically named columns) like so:

library(dplyr)

merged %>%
  mutate(across(c(oP.Res, TP.Res),
                .fns = function(result) {
                  cat_borders <- c(10, 375, 725, 1000, Inf) / 1000
                  cat = findInterval(result, cat_borders)
                  ## category borders for method 21 are the reverse
                  ## of those for method 24, we can mirror by subtracting from 5:
                  ifelse(MEAS_ANAL_METH_CODE == 24, cat, 5 - cat)
                },
                .names = "OP_METH_CAT_{.col}"
                )
         )

... result:

##   SAMP_SMPT_USER_REFERENCE MEAS_ANAL_METH_CODE oP.Res TP.Res OP_METH TP_METH
## 1                       31                  21   0.02  0.040      NA      NA
## 2                       32                  24   0.12  0.003      NA      NA
##   OP_METH_CAT_oP.Res OP_METH_CAT_TP.Res
## 1                  4                  4
## 2                  1                  0

Collectives™ on Stack Overflow

Using an if else loop to populate a dataframe in R

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related