1

I have a dataset that looks like this:

date    |country   | crisis | 
2011 q2 |   AT     |    0   |
2011 q3 |   AT     |    0   |
2011 q4 |   AT     |    1   |
2012 q1 |   AT     |    0   |
2012 q2 |   AT     |    1   | 
2011 q1 |   BE     |    0   |
2011 q3 |   BE     |    0   |
2011 q4 |   BE     |    0   |
2012 q1 |   BE     |    1   |
2012 q2 |   BE     |    0   | 

And I would like to compute another variable for each country-crisis pair that is 0 when crisis is 1 and then create a sequence starting from zero, like this

date    |country   | crisis | AT_crisis1 | AT_crisis2 | BE_crisis 1
2011 q2 |   AT     |    0   |  -2        |     -4     |    NA
2011 q3 |   AT     |    0   |  -1        |     -3     |    NA
2011 q4 |   AT     |    1   |   0        |     -2     |    NA
2012 q1 |   AT     |    0   |   1        |     -1     |    NA
2012 q2 |   AT     |    1   |   2        |      0     |    NA
2011 q1 |   BE     |    0   |   NA       |     NA     |    -3
2011 q3 |   BE     |    0   |   NA       |     NA     |    -2
2011 q4 |   BE     |    0   |   NA       |     NA     |    -1
2012 q1 |   BE     |    1   |   NA       |     NA     |     0
2012 q2 |   BE     |    0   |   NA       |     NA     |    +1

Here a reproducible example for the first dataset

library(zoo)

date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)

df <- data.frame(date, country, crisis)

Thank you.

3
  • 1
    In your expected output I don't see any variable being systematically 0 when crisis is 1, nor any sequence from -24 to 24, nor any sequence starting from 0. Can you explain how these columns are computed ? Commented Jan 9, 2020 at 15:44
  • Thank you very much for the comment. I removed the reference to +- 24, the variable is systematically 0 for each country - crisis pair. If you take the first appearance of the crisis (value 1) for the country AT the AT-crisis1 is equal to 0 when the first value of crisis is 1 Commented Jan 9, 2020 at 16:06
  • I got it now :), see answer below Commented Jan 9, 2020 at 16:12

2 Answers 2

3

Here's a way :

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)
date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)
df <- data.frame(date, country, crisis)

df %>%
  mutate_at("date", as.character) %>% # as it's a factor in sample data
  group_split(country) %>%
  map_dfr(~imap_dfr(which(.$crisis==1), ~mutate(..3, key =.y, value = seq(1-.,len = nrow(..3))), .)) %>%
  unite(key, country, key, sep = "_crisis", remove = FALSE) %>%
  spread(key, value) %>%
  arrange(country, date)
#> # A tibble: 10 x 6
#>    date    country crisis AT_crisis1 AT_crisis2 BE_crisis1
#>    <chr>   <fct>    <dbl>      <dbl>      <dbl>      <dbl>
#>  1 2011 Q2 AT           0         -2         -4         NA
#>  2 2011 Q3 AT           0         -1         -3         NA
#>  3 2011 Q4 AT           1          0         -2         NA
#>  4 2012 Q1 AT           0          1         -1         NA
#>  5 2012 Q2 AT           1          2          0         NA
#>  6 2011 Q2 BE           0         NA         NA         -3
#>  7 2011 Q3 BE           0         NA         NA         -2
#>  8 2011 Q4 BE           0         NA         NA         -1
#>  9 2012 Q1 BE           1         NA         NA          0
#> 10 2012 Q2 BE           0         NA         NA          1

Created on 2020-01-09 by the reprex package (v0.3.0)

The double map call is probably a bit intimidating, here's what it does :

  • The outside map call applies a transformation on each country dataframe created by split, and binds them together
  • the second map call loops on values of which(crisis), we use an imap function to keep track of the index too because we need it to name the column (store it in key at this step). For each which value we create a dataframe with a value column containing the sequence that will be moved to the new columns later, for this we need the data frame object, which we pass as a 3rd parameter, hence the last . of the line, and the ..3 in the call.
Sign up to request clarification or add additional context in comments.

Comments

1

An option using data.table:

library(data.table)
rbindlist(setDT(df)[, {
        idx <- .SD[crisis==1L, which=TRUE]
        names(idx) <- paste0(.BY$country, "_crisis", seq_along(idx))

        .(.(c(country=.BY$country, .SD, 
            lapply(idx, function(k) seq_len(.N) - k))))
    },
    country]$V1, 
use.names=TRUE, fill=TRUE)

output:

    country    date crisis AT_crisis1 AT_crisis2 BE_crisis1
 1:      AT 2011 Q2      0         -2         -4         NA
 2:      AT 2011 Q3      0         -1         -3         NA
 3:      AT 2011 Q4      1          0         -2         NA
 4:      AT 2012 Q1      0          1         -1         NA
 5:      AT 2012 Q2      1          2          0         NA
 6:      BE 2011 Q2      0         NA         NA         -3
 7:      BE 2011 Q3      0         NA         NA         -2
 8:      BE 2011 Q4      0         NA         NA         -1
 9:      BE 2012 Q1      1         NA         NA          0
10:      BE 2012 Q2      0         NA         NA          1

2 Comments

This is a great use case of nesting/unnesting with data.table!
I think MichaelChirico is working on it github.com/Rdatatable/data.table/pull/4156

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.