4

I want to create a function where I can programmatically add variables based on a user provided list of input variables and cutoff values.

Specifically, I want to define a function

myfun <- function(df, varlist, cutofflist)

returning df with one extra column for each variable in varlist containing the logical of whether each variable is at most the corresponding cutoff value.

For example, supposing we take the iris data frame,

df <- as_tibble(iris)
# A tibble: 150 x 1
   iris$Sepal.Length $Sepal.Width $Petal.Length $Petal.Width $Species
               <dbl>        <dbl>         <dbl>        <dbl> <fct>   
 1               5.1          3.5           1.4          0.2 setosa  
 2               4.9          3             1.4          0.2 setosa  
 3               4.7          3.2           1.3          0.2 setosa  
 4               4.6          3.1           1.5          0.2 setosa  
 5               5            3.6           1.4          0.2 setosa  
 6               5.4          3.9           1.7          0.4 setosa  
 7               4.6          3.4           1.4          0.3 setosa  
 8               5            3.4           1.5          0.2 setosa  
 9               4.4          2.9           1.4          0.2 setosa  
10               4.9          3.1           1.5          0.1 setosa  
# ... with 140 more rows

I would like the call

myfun(df, c("Sepal.Length", "Petal.Length"), list(Sepal.Length = 5, Petal.Length = 1.5))

to produce the same result as

df %>%
   mutate(
      Sepal.Length_indicator = (Sepal.Length <= 5),
      Petal.Length_indicator = (Petal.Length <= 1.5)
   )

i.e. this:

# A tibble: 150 x 7
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_indicator Petal.Length_indicator
          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <lgl>                  <lgl>                 
 1          5.1         3.5          1.4         0.2 setosa  FALSE                  TRUE                  
 2          4.9         3            1.4         0.2 setosa  TRUE                   TRUE                  
 3          4.7         3.2          1.3         0.2 setosa  TRUE                   TRUE                  
 4          4.6         3.1          1.5         0.2 setosa  TRUE                   TRUE                  
 5          5           3.6          1.4         0.2 setosa  TRUE                   TRUE                  
 6          5.4         3.9          1.7         0.4 setosa  FALSE                  FALSE                 
 7          4.6         3.4          1.4         0.3 setosa  TRUE                   TRUE                  
 8          5           3.4          1.5         0.2 setosa  TRUE                   TRUE                  
 9          4.4         2.9          1.4         0.2 setosa  TRUE                   TRUE                  
10          4.9         3.1          1.5         0.1 setosa  TRUE                   TRUE                  
# ... with 140 more rows

I am pretty new at using quosures and such with dplyr. What I was trying so far is the following:

myfun <- function(df, varlist, cutofflist){
  df %>%
    mutate_at(.vars = varlist, .funs = list(indicator = function(x) x<= cutofflist[[?]]))
}

but I don't know what should replace the ? above. The solution works if the cutoff is the same for all variables, but not if the cutoff depends on the variable.

Thank you in advance for your help.

1 Answer 1

1

Here is one option with map2 and transmute

library(tidyverse)
myfun <- function(data, varVec, cutofflist) {
    map2_dfc(varVec, cutofflist[varVec], ~   

                     data %>% 
                        transmute( !! paste0(.x, "_indicator") := 
                               !! rlang::sym(.x) <= .y)) %>%
                 bind_cols(df, .)

     }


out2 <- myfun(df, c("Sepal.Length", "Petal.Length"), 
        list(Sepal.Length = 5, Petal.Length = 1.5))   

-checking the output by running outside the function

out1 <- df %>%
         mutate(
           Sepal.Length_indicator = (Sepal.Length <= 5),
           Petal.Length_indicator = (Petal.Length <= 1.5)
          )    

identical(out1, out2)
#[1] TRUE

Or it can be done with map too as the 'varVec' and 'cutofflist' names are the same

myfun <- function(data, varVec, cutofflist) {
    map_dfc(varVec, ~   

                        data %>% 
                                  transmute( !! paste0(.x, "_indicator") := 
            !! rlang::sym(.x) <= cutofflist[[.x]])



                                 ) %>%
           bind_cols(df, .)

                                 }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.