4

I have a dataframe, which has 2 columns: date and return. Now I want to mutate multiple new columns, that are depending on two parameters: the threshold-parameter and the lag-parameter. The functionality is simple. The new column is calculated as follows:

var= ifelse(lag(return, n= lag_day)>threshold,return, NA))

If the lag(return) is higer than the threshold, than give me the return-value, else give me NA.

Here are the values for the thresholds and the lag_days:

threshold=c(2,4,6)
lag_day=c(1,2,3)

Here I'm solving my problem manually:

test<-df%>%
  mutate(var_t1_lag1= ifelse(lag(return, n= lag_day[1] )>threshold[1],return, NA))%>%
  mutate(var_t2_lag1= ifelse(lag(return, n= lag_day[1] )>threshold[2],return, NA))%>%
  mutate(var_t3_lag1= ifelse(lag(return, n= lag_day[1] )>threshold[3],return, NA))%>%
  mutate(var_t1_lag2= ifelse(lag(return, n= lag_day[2] )>threshold[1],return, NA))%>%
  mutate(var_t2_lag2= ifelse(lag(return, n= lag_day[2] )>threshold[2],return, NA))%>%
  mutate(var_t3_lag2= ifelse(lag(return, n= lag_day[2] )>threshold[3],return, NA))%>%
  mutate(var_t1_lag3= ifelse(lag(return, n= lag_day[3] )>threshold[1],return, NA))%>%
  mutate(var_t2_lag3= ifelse(lag(return, n= lag_day[3] )>threshold[2],return, NA))%>%
  mutate(var_t3_lag3= ifelse(lag(return, n= lag_day[3] )>threshold[3],return, NA))

But is there a solution that would it make easier? Maybe with one or two apply-functions?

Here is my example-dataframe:

df <- tibble(
  date= today()+0:12,
  return=c(1,2.5,2,3,5,6.5,1,9,3,2,4,7,2)
)

2 Answers 2

3

An option would be to get all the combinations of 'threshold', 'lag_day' with crossing, then loop through the rows (pmap), transmute to create the columns of interest and bind with the original dataset. This uses one function from base R (seq_along)

library(tidyverse)
crossing(threshold = seq_along(threshold), lag_day) %>%
    pmap_dfc(~  
             df %>%
               transmute(!! str_c("var_t", ..1, "_lag", ..2) := 
                  case_when(lag(return, n = ..2) > threshold[..1] ~ return, 
                            TRUE ~ NA_real_))) %>% 
   bind_cols(df, .)
Sign up to request clarification or add additional context in comments.

2 Comments

It's better to omit threshold[ ] in the 5th line of the function. Otherwise it can lead to errors. Now I'm working with the changed function: case_when(lag(return, n = ..2) > ..1 ~ return, (only line 5 is changed)
@TobKel Initialy, I was using crossing(threshold, lag_day) and with ..1, but then I need to change the column names based on the sequence, that is the reason I used seq_along with threshold[..1]. Interestingly, it is not giving an error for me
2

A base R approach using two apply loops with dplyr::lag

df[paste0("var_t", outer(seq_along(lag_day), seq_along(threshold),
   FUN = paste, sep = "_"))] <-  do.call(cbind, 
     lapply(lag_day, function(x) sapply(threshold, function(y) 
            ifelse(dplyr::lag(df$return, n = x) > y, df$return, NA))))


#   date       return var_t1_1 var_t2_1 var_t3_1 var_t1_2 var_t2_2 var_t3_2 var_t1_3 var_t2_3 var_t3_3
#   <date>      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
# 1 2019-05-21    1       NA       NA         NA     NA         NA       NA       NA       NA       NA
# 2 2019-05-22    2.5     NA       NA         NA     NA         NA       NA       NA       NA       NA
# 3 2019-05-23    2        2       NA         NA     NA         NA       NA       NA       NA       NA
# 4 2019-05-24    3       NA       NA         NA      3         NA       NA       NA       NA       NA
# 5 2019-05-25    5        5       NA         NA     NA         NA       NA        5       NA       NA
# 6 2019-05-26    6.5      6.5      6.5       NA      6.5       NA       NA       NA       NA       NA
# 7 2019-05-27    1        1        1          1      1          1       NA        1       NA       NA
# 8 2019-05-28    9       NA       NA         NA      9          9        9        9        9       NA
# 9 2019-05-29    3        3        3          3     NA         NA       NA        3        3        3
#10 2019-05-30    2        2       NA         NA      2          2        2       NA       NA       NA
#11 2019-05-31    4       NA       NA         NA      4         NA       NA        4        4        4
#12 2019-06-01    7        7       NA         NA     NA         NA       NA        7       NA       NA
#13 2019-06-02    2        2        2          2      2         NA       NA       NA       NA       NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.