0

I have a data set that has start date and end date. Some of the end dates are missing. As you can see below, I have tried three different approaches and none of them is working.

startDay <- as.Date(c("2015-01-01","2015-03-01","2016-07-15","2016-08-02"), "%Y-%m-%d")
endDay <- as.Date(c("2018-01-01",NA,"2018-03-05",NA), "%Y-%m-%d")
id <- 1:4
dt <- data.frame(id, startDay, endDay)
dt
str(dt)

dt$caseDay <- as.Date("2018-07-20", "%Y-%m-%d")  
str(dt)
dt

This one changes the class of my variable from date to numeric:

dt$EndDay1 <-
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay)
str(dt)
dt

This one generates an error message.

dt$EndDay2 <-as.Date(
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay), "%Y-%m-%d")
str(dt)
dt

If my research/understanding of related posts is correct, version 3 below should resolve the problem. However, this converted everything to missing values.

dt$EndDay3 <-as.Date(as.character(
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay)), "%Y-%m-%d")
str(dt)
dt

Any suggestion on how to solve this? Thanks

2
  • 2
    Instead of the format string, you need as.Date(ifelse(...), origin = "1970-01-01"). Commented Dec 22, 2018 at 17:41
  • Thanks this is is very useful because it helped me see the mistake in my script. Commented Dec 22, 2018 at 19:54

1 Answer 1

6

Here's another idea:

library(dplyr)
library(lubridate)

We'll use lubridate::ymd and dplyr::case_when (see this lubridate cheat sheet for more goodies).

Your data:

dt <- tibble(
  startDay = ymd(c("2015-01-01", "2015-03-01", "2016-07-15", "2016-08-02")),
  endDay = ymd(c("2018-01-01", NA, "2018-03-05", NA))
)

The caseDay:

caseDay <- ymd("2018-07-20")

Use case_when:

dt <- dt %>%
  mutate(endDay = case_when(is.na(endDay) ~ caseDay,
                            TRUE ~ endDay))

(Note: case TRUE is like "default" if none of the cases flagged)

Result:

> dt
# A tibble: 4 x 2
  startDay   endDay    
  <date>     <date>    
1 2015-01-01 2018-01-01
2 2015-03-01 2018-07-20
3 2016-07-15 2018-03-05
4 2016-08-02 2018-07-20
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Someone posted and deleted another simple solution. I have included it below in case other people find it useful: dt$EndDay1 <- dt$endDay dt$EndDay1[is.na(dt$endDay)] <- dt$caseDay[is.na(dt$endDay)] str(dt) dt
@TCS nice, post it as a solution (you are allowed to do this :-) )

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.