4

Suppose I have a dataframe df in R like this:

    A    B    C    D
   1.4  4.0  6.0  1.0
   2.5  1.5  2.4  2.3
   3.0  1.7  2.5  3.4

Say I want to write a function that checks the value of each cell in every row in several specified columns, and performs calculations on them depending on the value, and puts the results in new columns.

Take a simple example. I want to do the following for columns A, B and D: if the corresponding value, call it x, of the row of the corresponding column is x < 2, I want to add 1, if 2 <= x < 3, I want to add 3, if 3 <= x < 4, I want to add 5, and do nothing otherwise. I want to store results in 3 new columns called A_New, B_New, D_New.

So this is what I want to get:

   A    B     C    D      A_New   B_New   D_New
 1.4   4.0   6.0   1.0     2.4     4.0     2.0
 2.5   1.5   2.4   2.3     5.5     2.5     5.3
 3.0   1.7   2.5   3.4     8.0     2.7     8.4

I am struggling to create R code that will do this (preferably using dplyr / tidyverse library). Please help.

2
  • dplyr::across will be your friend. Commented Apr 17 at 22:00
  • 1
    @TimG, I can appreciate you might not think your solution is worth an answer, but it's better to post answers (even fairly small/trivial ones) as answers rather than as comments ... (not sure why someone downvoted your answer, I voted to undelete it ...) Commented Apr 18 at 12:57

4 Answers 4

6

As @Limey says in comments, dplyr::across() (+ case_when()) does everything you need ...

dd <-  read.table(header=TRUE, text = "
   A    B    C    D
   1.4  4.0  6.0  1.0
   2.5  1.5  2.4  2.3
   3.0  1.7  2.5  3.4
")
library(dplyr)
dd |> 
   mutate(across(c(A, B, D),
          .names = "{.col}_New",
          ~ case_when(. < 2  ~ . + 1,
                      . < 3  ~ . + 3,
                      . < 4  ~ . + 5,
                      .default = .)))
  • the tests in case_when are evaluated sequentially, so (for example) we don't need to test for x >= 2 in the second case
  • there might be some marginally more efficient way to construct a mathematical expression to do this (e.g. "if x < 5, add ceiling(x-2)*2 +1 to x") (or something clever using cut() and a matching vector of increments), but it will be harder to understand and less generalizable ...
Sign up to request clarification or add additional context in comments.

1 Comment

6

With base R, you can use findInterval like below

> nms <- c("A", "B", "D")

> df[paste0(nms, "_New")] <- df[nms] + c(1, 3, 5, 0)[findInterval(unlist(df[nms]), c(2, 3, 4)) + 1]

> df
    A   B   C   D A_New B_New D_New
1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

1 Comment

this is nice. I might choose to do it in a few separate steps for interpretability (e.g. something like offset <- c(1, 3, 5, 0); fcat <- findInterval(...)+1; ... <- df[nms] + offset[fcat]
6

You can create a step function (stepfun) to define which value to add in each interval.

mystep <- stepfun(c(2, 3, 4), c(1, 3, 5, 0))
  1. With {base}
nms <- c("A", "B", "D")
df[paste0(nms, "_New")] <- lapply(df[nms], \(x) x + mystep(x))
  1. With {dplyr}
df %>%
  mutate(across(c(A, B, D), ~ .x + mystep(.x), .names = "{.col}_New"))

#     A   B   C   D A_New B_New D_New
# 1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
# 2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
# 3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

Note: stepfun() returns a function as shown in the graph below:

plot(mystep)

enter image description here

2 Comments

I didn't know about plot(mystep). This is quite nice.
mystep is vectorized already. No need of lapply: df + mystep(unlist(df))
0

Another option could be to write your requirements as a logical function

f(x)=x+(1⋅[x<2]+3⋅[2≤x<3]+5⋅[3≤x<4]) assuming that your [cases] don't logically overlap

cols <- c("A", "B", "D")                         #add * (condition) 
d[paste0(cols, "_New")] <- lapply(d[cols],\(x)x +(1*(x<2) +      # first case
                                                  3*(x>=2&x<3) + # 2nd case
                                                  5*(x>=3&x<4)   # 3rd case
                                                  ))

giving

    A   B   C   D A_New B_New D_New
1 1.4 4.0 6.0 1.0   2.4   4.0   2.0
2 2.5 1.5 2.4 2.3   5.5   2.5   5.3
3 3.0 1.7 2.5 3.4   8.0   2.7   8.4

Sample data

d <- data.frame(A = c(1.4, 2.5, 3.0), B = c(4.0, 1.5, 1.7), C = c(6.0, 2.4, 2.5), D = c(1.0, 2.3, 3.4))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.