3

Right now, my dataset is in wide format, meaning I have one row per person, but I want a long dataset, with multiple rows per person. I have two date variables, ADATE and DDATE, that I want to use as my start and end points, respectively. For example, if someone's ADATE is 02/04/10 and DDATE is 02/07/10, I want 4 rows:

Have:

ID ADATE     DDATE     
1  02/04/10  02/07/10 

Want:

ID ADATE     DDATE     NEW_DATE
1  02/04/10  02/07/10  02/04/10
1  02/04/10  02/07/10  02/05/10
1  02/04/10  02/07/10  02/06/10
1  02/04/10  02/07/10  02/07/10

I have multiple datasets that I want to do this for, and I have written code that works for every single dataset except one...I'm not sure why. This is my attempt and the error I get:

jan15_long <- chf_jan15 %>%
  mutate(NEW_DATE = as.Date(ADATE)) %>%
  group_by(ID) %>%
  complete(NEW_DATE = seq.Date(as.Date(ADATE), as.Date(DDATE), by = "day")) %>%
  fill(vars) %>%
  ungroup()
Error in seq.Date(as.Date(ADATE), as.Date(DDATE), by = "day") : 
  'from' must be of length 1

The above code gives me what I want and runs perfectly for every other dataset I have (10 out of 11).

Is there a better way to do this? dplyr makes the most sense to me, so hopefully there's a solution to this.

1 Answer 1

3

If there are more than one row, the seq needs to be looped. We can use map2. Also, based on the format of the 'DATE' columns, the as.Date needs a format argument i.e. as.Date(ADATE, "%m/%d/%y") (assuming it is month/day/year format)

library(dplyr)
library(purrr)
library(lubridate)
chf_jan15 %>%
    mutate_at(vars(ends_with("DATE")), mdy) %>%
    mutate(random_date = map2(ADATE, DDATE, seq, by = "day")) %>%
    unnest(c(random_date))
# A tibble: 4 x 4
#     ID ADATE      DDATE      random_date
#  <int> <date>     <date>     <date>     
#1     1 2010-02-04 2010-02-07 2010-02-04 
#2     1 2010-02-04 2010-02-07 2010-02-05 
#3     1 2010-02-04 2010-02-07 2010-02-06 
#4     1 2010-02-04 2010-02-07 2010-02-07 

If there is only a single row, after converting to Date class, the complete should work

library(tidyr)
chf_jan15 %>%
   mutate_at(vars(ends_with("DATE")), as.Date, format = "%m/%d/%y") %>%
   mutate(NEW_DATE = ADATE) %>%      
   complete(NEW_DATE = seq(ADATE, DDATE, by = 'day')) %>%
   fill(c(ID, ADATE, DDATE))
# A tibble: 4 x 4
#  NEW_DATE      ID ADATE      DDATE     
#  <date>     <int> <date>     <date>    
#1 2010-02-04     1 2010-02-04 2010-02-07
#2 2010-02-05     1 2010-02-04 2010-02-07
#3 2010-02-06     1 2010-02-04 2010-02-07
#4 2010-02-07     1 2010-02-04 2010-02-07

If there is a single row for each each 'ID', then we can group_split and use complete

chf_jan15 %>%
    mutate_at(vars(ends_with("DATE")), as.Date, format = "%m/%d/%y") %>%
    mutate(NEW_DATE = ADATE) %>%
    group_split(ID) %>%
    map_dfr(~ .x %>%
                 complete(NEW_DATE = seq(ADATE, DDATE, by = 'day')) %>%
                  fill(c(ID, ADATE, DDATE)))

data

chf_jan15 <- structure(list(ID = 1L, ADATE = "02/04/10", 
    DDATE = "02/07/10"), class = "data.frame", row.names = c(NA, 
-1L))
Sign up to request clarification or add additional context in comments.

10 Comments

Hi akrun. I have to do this within each group. Should I add a group_by before mutate_at? Thanks!
@user122514 It is not needed because the map2 is loopiing over each row
Hi akrun. The first chunk of code works, but the second one doesn't. I'm still getting an error saying that 'from' must be of length 1.
@user122514 sorry, it needs a group_by before complete updated the code. Please check
Thanks! I will try to read up on map2 as I don't really use the purrr package.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.