1

As a disclaimer, I rarely use any apply functions, and this might be the reason why I am struggling with this task.

I would like to create multiple new columns (say 5) within an R dataframe

imagine a dataset representing the daily closing price of the S&P500

df <- data.frame(symbol = rep(S&P_500, n = 5), value =c(2500,2550,2560,2600,2400))

If I wanted (inefficiently) to create 5 new variables, each indicating for each row, the lag value n, (n, 1:5)

I could do :

df %>%
mutate(S&P500_LAG_1 = lag(value, n=1),
       S&P500_LAG_2 = lag(value, n=2),
       S&P500_LAG_3 = lag(value, n=3),
       S&P500_LAG_4 = lag(value, n=4),
       S&P500_LAG_5 = lag(value, n=5))

How could I reduce this, (and also probably make it more efficient) and create a function for this that takes as inputs the symbol (so that I can scale this method to other symbols) and outputs the lag value of the symbol I pass on as an input?

Thanks!

0

1 Answer 1

3

We could use shift from data.table that can take multiple values for n

library(data.table)
setDT(df)[,paste0("S&P500_LAG_", 1:5) :=  shift(value, n = 1:5)]

If there are multiple symbols in a single column, then do a group by symbol

setDT(df)[,paste0("symbol_LAG_", 1:5) :=  shift(value, n = 1:5), by = symbol]

In tidyverse, one approach is to map

library(dplyr)
library(stringr)
library(purrr)
map_dfc(1:5, ~ df %>%
              transmute(!! str_c("S&P500_LAG_", .x) := lag(value, n = .x))) %>%
   bind_cols(df, .)

With multiple symbols

map_dfc(1:5, ~ df %>%
              group_by(symbol) %>%
              transmute(!! str_c("symbol_LAG_", .x) := lag(value, n = .x))) %>%
   bind_cols(df, .)
Sign up to request clarification or add additional context in comments.

1 Comment

For the multiple symbols tidyverse solution @akrun you're missing a %>% but then when I got it to run it kept producing annoying extra columns so I actually would up going this way which is 90% your solution with a small tweak. map_dfc(1:5, ~ df %>% group_by(symbol) %>% mutate(!! str_c("symbol_LAG_", .x) := lag(value, n = .x))) %>% select(symbol, value, contains("symbol_LAG_"))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.