3

My goal is to go through various signals and ignore any 1's that are not part of a series (minimum of at least two 1's in a row). The data is an xts time series with 180K+ columns and 84 months. I've provided a small simplified data set I've used a nest for loop, but it's taking way too long to finish on the entire data set. It works but is horribly inefficient.

I know there's some way to use an apply function, but I can't figure it out.

Example data:

    mod_sig <- data.frame(a = c(0,1,0,0,0,1,1,0,0,0,1,0,1,1), 
                          b = c(0,0,1,0,0,1,0,0,0,1,1,1,1,1), 
                          c = c(0,1,0,1,0,1,1,1,0,0,0,1,1,0), 
                          d = c(0,1,1,1,0,1,1,0,0,1,1,1,1,1),
                          e = c(0,0,0,0,0,0,0,0,0,0,1,0,0,0))

    mod_sig <- xts(mod_sig, order.by = as.Date(seq(as.Date("2016-01-01"), as.Date("2017-02-01"), by = "month")))

Example code:

   # fixing months where condition is only met for one month
   # creating a new data frame for modified signals
   Signals_Fin <- data.frame(matrix(nrow = nrow(mod_sig), ncol = ncol(mod_sig)))
   colnames(Signals_Fin) <- colnames(mod_sig)

   # Loop over Signals to change 1's to 0's for one month events
   for(col in 1:ncol(mod_sig)) {
     for(row in 1:nrow(mod_sig)) {
       val <- ifelse(mod_sig[row,col] == 1, 
                     ifelse(mod_sig[row-1,col] == 0, 
                            ifelse(mod_sig[row+1,col] == 0,0,1),1),0)
       Signals_Fin[row, col] <- val
     }
   }

As you can see with the loop, any 1's that aren't in a sequence are changed to 0's. I know there is a better way, so I'm hoping to improve my approach. Any insights would be greatly appreciated. Thanks!

Answer from Zack and Ryan:

Zack and Ryan were spot on with dyplr, I only made slight modifications based off what was given and some colleague help.

Answer code:

    mod_sig <- data.frame(a = c(0,1,0,0,0,1,1,0,0,0,1,0,1,1), 
                      b = c(0,0,1,0,0,1,0,0,0,1,1,1,1,1), 
                      c = c(0,1,0,1,0,1,1,1,0,0,0,1,1,0), 
                      d = c(0,1,1,1,0,1,1,0,0,1,1,1,1,1),
                      e = c(0,0,0,0,0,0,0,0,0,0,1,0,0,0))

    Signals_fin = mod_sig %>% 
                  mutate_all(funs(ifelse((. == 1 & (lag(.) == 1 | lead(.) == 1)),1,0))) %>% 
                  mutate_all(funs(ifelse(is.na(.), 0, .)))


    Signals_fin <- xts(Signals_fin, order.by = as.Date(seq(as.Date("2016-01-01"), as.Date("2017-02-01"), by = "month")))
3
  • Apply is as slow as any loop. The solution here is to loop over a matrix and not a data.frame. The speed increase exponentially Commented May 17, 2018 at 16:42
  • Or an alternative can be provided by an elegant data.table solution Commented May 17, 2018 at 16:46
  • Thanks for the suggestions, looks like Zach and Ryan were spot on with dyplr. Commented May 17, 2018 at 19:54

1 Answer 1

2

here's a stab from a dplyr perspective, I converted your row_names to a column but you can just as easily convert them back to rownames with tibble::column_to_rownames():

library(dplyr)
library(tibble)

mod_sig %>%
  as.data.frame() %>%
  rownames_to_column('months') %>%
  mutate_at(vars(-months), function(x){
    if_else(x == 1 & 
              (lag(x, order_by = .$months) == 1 | 
                 lead(x, order_by = .$months) == 1),
            1,
            0)
  })

As suggested by @Ryan, his mutate_at call is more elegant, it's important everything is already sorted, though:

mod_sig %>%
  as.data.frame() %>%
  rownames_to_column('months') %>%
  mutate_at(vars(-months),  ~ as.numeric(.x & (lag(.x) | lead(.x))))

And to build on his suggestion:

mod_sig %>%
  as.data.frame() %>%
  mutate_all(~ as.numeric(.x & (lag(.x) | lead(.x))))
Sign up to request clarification or add additional context in comments.

1 Comment

This is great, dyplr was definetly the way to go. I ended up using basically what you had, with minor revisions. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.