0

I have a dataframe like this in R:

Start date End date Date 1 Date 2 Date 3 Date 4
11/12/2018 29/11/2019 08/03/2021 NA NA NA
07/03/2018 24/04/2019 08/03/2021 12/09/2016 NA NA
04/06/2018 23/04/2019 08/03/2021 02/10/2017 05/10/2018 NA
26/07/2018 29/08/2019 08/03/2021 03/08/2015 02/10/2017 23/01/2017

I want to create a new column in R that says: If Date 1, Date 2, Date 3 or Date 4 is between Start Date and End date, it should return 1, 0 otherwise, as the table below:

Start date End date Date 1 Date 2 Date 3 Date 4 Change
11/12/2018 29/11/2019 08/03/2021 NA NA NA 0
07/03/2018 24/04/2019 08/03/2021 12/09/2016 NA NA 0
04/06/2018 23/04/2019 08/03/2021 02/10/2017 05/10/2018 NA 1
26/07/2018 29/08/2019 08/03/2021 03/08/2015 02/10/2017 23/01/2017 0

Does anyone have a suggestion on how to solve this? Thank you :)

1
  • 1
    Please post your data the output from the command dput(your_dataframe) so we can access your data more easily. Also include any code you've tried and/or errors you've gotten. Commented Apr 12, 2021 at 17:15

2 Answers 2

0

It'll make it much easier for people to help you if you can post code / data which we can run directly. The easiest way to do this is to use a handy R function called dput, which generates instructions to exactly recreate any R object. So you might run dput(MY_DATA), or if your data is much larger than needed to demonstrate your question, dput(head(MY_DATA)) to get the first six rows, and paste the output of that into your question. </PSA>


Here's code to generate your example data:

my_data <- data.frame(
  stringsAsFactors = FALSE,
        Start.date = c("11/12/2018", "07/03/2018", "04/06/2018", "26/07/2018"),
          End.date = c("29/11/2019", "24/04/2019", "23/04/2019", "29/08/2019"),
            Date.1 = c("08/03/2021", "08/03/2021", "08/03/2021", "08/03/2021"),
            Date.2 = c(NA, "12/09/2016", "02/10/2017", "03/08/2015"),
            Date.3 = c(NA, NA, "05/10/2018", "02/10/2017"),
            Date.4 = c(NA, NA, NA, "23/01/2017")
)

Here's a tidyverse approach to first convert your day/month/year dates into data in R's Date type using lubridate::dmy, then to compare each of Date.1 thru Date.4 against your start dates, and then finally to show if there are any 1's (within range).

library(dplyr); library(lubridate)
my_data %>%
  mutate(across(.fns = ~dmy(.x))) %>%
  mutate(across(.cols = starts_with("Date"),
                .fns = ~coalesce(.x >= Start.date & .x <= End.date, FALSE)*1)) %>%
  mutate(Change = pmax(Date.1, Date.2, Date.3, Date.4))

coalesce(..., FALSE) used here to treat NA like FALSE.

(...)*1 to convert TRUE/FALSE to 1/0.

pmax(...) to grab the largest of the 1/0's, i.e. "are there any 1's?"


Edit: alternative to leave Date columns intact:

my_data %>%
  mutate(across(.fns = ~dmy(.x))) %>%
  mutate(across(.cols = starts_with("Date"), 
                .names = "Check_{.col}",
                .fns = ~coalesce(.x >= Start.date & .x <= End.date, FALSE)*1)) %>%
  rowwise() %>%
  mutate(Change = max(c_across(starts_with("Check")))) %>%
  select(-starts_with("Check"))

  Start.date End.date   Date.1     Date.2     Date.3     Date.4     Change
  <date>     <date>     <date>     <date>     <date>     <date>      <dbl>
1 2018-12-11 2019-11-29 2021-03-08 NA         NA         NA              0
2 2018-03-07 2019-04-24 2021-03-08 2016-09-12 NA         NA              0
3 2018-06-04 2019-04-23 2021-03-08 2017-10-02 2018-10-05 NA              1
4 2018-07-26 2019-08-29 2021-03-08 2015-08-03 2017-10-02 2017-01-23      0
Sign up to request clarification or add additional context in comments.

Comments

0
library(tidyverse)
library(lubridate)

df <- read.table(textConnection("start_date;end_date;date_1;date_2;date_3;date_4
11/12/2018;29/11/2019;08/03/2021;NA;NA;NA
07/03/2018;24/04/2019;08/03/2021;12/09/2016;NA;NA
04/06/2018;23/04/2019;08/03/2021;02/10/2017;05/10/2018;NA
26/07/2018;29/08/2019;08/03/2021;03/08/2015;02/10/2017;23/01/2017"),
                 sep=";",
                 header = TRUE)
df %>%
  mutate(
    across(everything(), lubridate::dmy),
    change = ((date_1 > start_date & date_1 < end_date) |
                (date_2 > start_date & date_2 < end_date) |
                (date_3 > start_date & date_3 < end_date)
    ) %>%
      coalesce(FALSE) %>%
      as.integer()
  )
#>   start_date   end_date     date_1     date_2     date_3     date_4 change
#> 1 2018-12-11 2019-11-29 2021-03-08       <NA>       <NA>       <NA>      0
#> 2 2018-03-07 2019-04-24 2021-03-08 2016-09-12       <NA>       <NA>      0
#> 3 2018-06-04 2019-04-23 2021-03-08 2017-10-02 2018-10-05       <NA>      1
#> 4 2018-07-26 2019-08-29 2021-03-08 2015-08-03 2017-10-02 2017-01-23      0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.