Generate sequence from a value using R

Question

I have a dataset that looks like this:

date    |country   | crisis | 
2011 q2 |   AT     |    0   |
2011 q3 |   AT     |    0   |
2011 q4 |   AT     |    1   |
2012 q1 |   AT     |    0   |
2012 q2 |   AT     |    1   | 
2011 q1 |   BE     |    0   |
2011 q3 |   BE     |    0   |
2011 q4 |   BE     |    0   |
2012 q1 |   BE     |    1   |
2012 q2 |   BE     |    0   |

And I would like to compute another variable for each country-crisis pair that is 0 when crisis is 1 and then create a sequence starting from zero, like this

date    |country   | crisis | AT_crisis1 | AT_crisis2 | BE_crisis 1
2011 q2 |   AT     |    0   |  -2        |     -4     |    NA
2011 q3 |   AT     |    0   |  -1        |     -3     |    NA
2011 q4 |   AT     |    1   |   0        |     -2     |    NA
2012 q1 |   AT     |    0   |   1        |     -1     |    NA
2012 q2 |   AT     |    1   |   2        |      0     |    NA
2011 q1 |   BE     |    0   |   NA       |     NA     |    -3
2011 q3 |   BE     |    0   |   NA       |     NA     |    -2
2011 q4 |   BE     |    0   |   NA       |     NA     |    -1
2012 q1 |   BE     |    1   |   NA       |     NA     |     0
2012 q2 |   BE     |    0   |   NA       |     NA     |    +1

Here a reproducible example for the first dataset

library(zoo)

date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)

df <- data.frame(date, country, crisis)

Thank you.

In your expected output I don't see any variable being systematically 0 when crisis is 1, nor any sequence from -24 to 24, nor any sequence starting from 0. Can you explain how these columns are computed ? — moodymudskipper
– moodymudskipper, Commented Jan 9, 2020 at 15:44
Thank you very much for the comment. I removed the reference to +- 24, the variable is systematically 0 for each country - crisis pair. If you take the first appearance of the crisis (value 1) for the country AT the AT-crisis1 is equal to 0 when the first value of crisis is 1 — Jordan_b
– Jordan_b, Commented Jan 9, 2020 at 16:06

moodymudskipper · Accepted Answer · 2020-01-09 16:12:39Z

Here's a way :

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)
date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)
df <- data.frame(date, country, crisis)

df %>%
  mutate_at("date", as.character) %>% # as it's a factor in sample data
  group_split(country) %>%
  map_dfr(~imap_dfr(which(.$crisis==1), ~mutate(..3, key =.y, value = seq(1-.,len = nrow(..3))), .)) %>%
  unite(key, country, key, sep = "_crisis", remove = FALSE) %>%
  spread(key, value) %>%
  arrange(country, date)
#> # A tibble: 10 x 6
#>    date    country crisis AT_crisis1 AT_crisis2 BE_crisis1
#>    <chr>   <fct>    <dbl>      <dbl>      <dbl>      <dbl>
#>  1 2011 Q2 AT           0         -2         -4         NA
#>  2 2011 Q3 AT           0         -1         -3         NA
#>  3 2011 Q4 AT           1          0         -2         NA
#>  4 2012 Q1 AT           0          1         -1         NA
#>  5 2012 Q2 AT           1          2          0         NA
#>  6 2011 Q2 BE           0         NA         NA         -3
#>  7 2011 Q3 BE           0         NA         NA         -2
#>  8 2011 Q4 BE           0         NA         NA         -1
#>  9 2012 Q1 BE           1         NA         NA          0
#> 10 2012 Q2 BE           0         NA         NA          1

^{Created on 2020-01-09 by the reprex package (v0.3.0)}

The double map call is probably a bit intimidating, here's what it does :

The outside map call applies a transformation on each country dataframe created by split, and binds them together
the second map call loops on values of which(crisis), we use an imap function to keep track of the index too because we need it to name the column (store it in key at this step). For each which value we create a dataframe with a value column containing the sequence that will be moved to the new columns later, for this we need the data frame object, which we pass as a 3rd parameter, hence the last . of the line, and the ..3 in the call.

chinsoon12 · Accepted Answer · 2020-01-10 03:02:17Z

1

An option using data.table:

library(data.table)
rbindlist(setDT(df)[, {
        idx <- .SD[crisis==1L, which=TRUE]
        names(idx) <- paste0(.BY$country, "_crisis", seq_along(idx))

        .(.(c(country=.BY$country, .SD, 
            lapply(idx, function(k) seq_len(.N) - k))))
    },
    country]$V1, 
use.names=TRUE, fill=TRUE)

output:

    country    date crisis AT_crisis1 AT_crisis2 BE_crisis1
 1:      AT 2011 Q2      0         -2         -4         NA
 2:      AT 2011 Q3      0         -1         -3         NA
 3:      AT 2011 Q4      1          0         -2         NA
 4:      AT 2012 Q1      0          1         -1         NA
 5:      AT 2012 Q2      1          2          0         NA
 6:      BE 2011 Q2      0         NA         NA         -3
 7:      BE 2011 Q3      0         NA         NA         -2
 8:      BE 2011 Q4      0         NA         NA         -1
 9:      BE 2012 Q1      1         NA         NA          0
10:      BE 2012 Q2      0         NA         NA          1

answered Jan 10, 2020 at 3:02

chinsoon12

25.2k4 gold badges27 silver badges35 bronze badges

2 Comments

moodymudskipper Over a year ago

This is a great use case of nesting/unnesting with data.table!

chinsoon12 Over a year ago

I think MichaelChirico is working on it github.com/Rdatatable/data.table/pull/4156

Collectives™ on Stack Overflow

Generate sequence from a value using R

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related