How to do multiple if/else logic calculations and apply them to several columns in a dataframe?

Question

Suppose I have a dataframe df in R like this:

    A    B    C    D
   1.4  4.0  6.0  1.0
   2.5  1.5  2.4  2.3
   3.0  1.7  2.5  3.4

Say I want to write a function that checks the value of each cell in every row in several specified columns, and performs calculations on them depending on the value, and puts the results in new columns.

Take a simple example. I want to do the following for columns A, B and D: if the corresponding value, call it x, of the row of the corresponding column is x < 2, I want to add 1, if 2 <= x < 3, I want to add 3, if 3 <= x < 4, I want to add 5, and do nothing otherwise. I want to store results in 3 new columns called A_New, B_New, D_New.

So this is what I want to get:

   A    B     C    D      A_New   B_New   D_New
 1.4   4.0   6.0   1.0     2.4     4.0     2.0
 2.5   1.5   2.4   2.3     5.5     2.5     5.3
 3.0   1.7   2.5   3.4     8.0     2.7     8.4

I am struggling to create R code that will do this (preferably using dplyr / tidyverse library). Please help.

@TimG, I can appreciate you might not think your solution is worth an answer, but it's better to post answers (even fairly small/trivial ones) as answers rather than as comments ... (not sure why someone downvoted your answer, I voted to undelete it ...) — Ben Bolker
– Ben Bolker, Commented Apr 18 at 12:57

Ben Bolker · Accepted Answer · 2025-04-18 12:51:37Z

6

As @Limey says in comments, dplyr::across() (+ case_when()) does everything you need ...

dd <-  read.table(header=TRUE, text = "
   A    B    C    D
   1.4  4.0  6.0  1.0
   2.5  1.5  2.4  2.3
   3.0  1.7  2.5  3.4
")

library(dplyr)
dd |> 
   mutate(across(c(A, B, D),
          .names = "{.col}_New",
          ~ case_when(. < 2  ~ . + 1,
                      . < 3  ~ . + 3,
                      . < 4  ~ . + 5,
                      .default = .)))

the tests in case_when are evaluated sequentially, so (for example) we don't need to test for x >= 2 in the second case
there might be some marginally more efficient way to construct a mathematical expression to do this (e.g. "if x < 5, add ceiling(x-2)*2 +1 to x") (or something clever using cut() and a matching vector of increments), but it will be harder to understand and less generalizable ...

edited Apr 18 at 12:51

answered Apr 17 at 22:04

Ben Bolker

230k26 gold badges405 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

M-- Apr 18 at 2:33

@BenBolker stackoverflow.com/a/78973331

ThomasIsCoding · Accepted Answer · 2025-04-17 22:29:45Z

6

With base R, you can use findInterval like below

> nms <- c("A", "B", "D")

> df[paste0(nms, "_New")] <- df[nms] + c(1, 3, 5, 0)[findInterval(unlist(df[nms]), c(2, 3, 4)) + 1]

> df
    A   B   C   D A_New B_New D_New
1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

answered Apr 17 at 22:29

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

1 Comment

Ben Bolker Apr 17 at 23:27

this is nice. I might choose to do it in a few separate steps for interpretability (e.g. something like offset <- c(1, 3, 5, 0); fcat <- findInterval(...)+1; ... <- df[nms] + offset[fcat]

Darren Tsai · Accepted Answer · 2025-04-18 02:30:23Z

6

You can create a step function (stepfun) to define which value to add in each interval.

mystep <- stepfun(c(2, 3, 4), c(1, 3, 5, 0))

With {base}

nms <- c("A", "B", "D")
df[paste0(nms, "_New")] <- lapply(df[nms], \(x) x + mystep(x))

With {dplyr}

df %>%
  mutate(across(c(A, B, D), ~ .x + mystep(.x), .names = "{.col}_New"))

#     A   B   C   D A_New B_New D_New
# 1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
# 2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
# 3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

Note: stepfun() returns a function as shown in the graph below:

plot(mystep)

edited Apr 18 at 2:30

answered Apr 18 at 1:51

Darren Tsai

36.6k6 gold badges27 silver badges58 bronze badges

2 Comments

M-- Apr 18 at 2:30

I didn't know about plot(mystep). This is quite nice.

Onyambu Apr 18 at 3:03

mystep is vectorized already. No need of lapply: df + mystep(unlist(df))

lailaps · Accepted Answer · 2025-04-18 10:36:29Z

Another option could be to write your requirements as a logical function

f(x)=x+(1⋅[x<2]+3⋅[2≤x<3]+5⋅[3≤x<4]) assuming that your [cases] don't logically overlap

cols <- c("A", "B", "D")                         #add * (condition) 
d[paste0(cols, "_New")] <- lapply(d[cols],\(x)x +(1*(x<2) +      # first case
                                                  3*(x>=2&x<3) + # 2nd case
                                                  5*(x>=3&x<4)   # 3rd case
                                                  ))

giving

    A   B   C   D A_New B_New D_New
1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

Sample data

d <- data.frame(A = c(1.4, 2.5, 3.0), B = c(4.0, 1.5, 1.7), C = c(6.0, 2.4, 2.5), D = c(1.0, 2.3, 3.4))

Collectives™ on Stack Overflow

How to do multiple if/else logic calculations and apply them to several columns in a dataframe?

4 Answers 4

1 Comment

1 Comment

2 Comments

Sample data

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

2 Comments

Sample data

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related