0

I have been struggling with a simple task (I guess).

I have a dataset containing two columns with start and end date. I want to extract all the months between start and end date, and list them all together in a new column of the dataframe. The next step would be to create dummies for each month listed in that column.

My input data look like this:

Lon      Lat      Year    Start_date     End_date    

70.25    40.25    2000    10/01/2009     04/30/2010  

70.75    40.25    2000    05/01/2010     08/30/2010   

71.00    40.25    2000    07/07/2010     11/30/2010   

This is what I would like to obtain:


Lon       Lat    Year     start_date  end_date      Sequence

70.25    40.25    2000    10/01/2009   04/30/2010   10,11,12,1,2,3,4

70.75    40.25    2000    05/01/2010   08/30/2010   5,6,7,8

71.00    40.25    2000    07/01/2010   11/30/2010   7,8,9,10,11

Where the last column contains a list of all the months (as number) between start_date and end_date.

This is my tentative code.

sequence <- Map(seq.dates, start_date, end_date, by = "months", format = "%m/%d/%y")

The code works fine and gives me a list with all the months from start to end date, which is what I was aiming at. However, I am not able to cope with the list then, as I do not find any good way to extract the values of the list into a new column of the dataframe, while keeping the structure (the levels). I have tried almost any suggested in stackoverflaw on how to extract values from the list, and nothing works. So, I want to start over and change perspective.

Is there any other way to redesign the function above in a way to produce a new column attached to my data, or a vector? AND NOT A LIST? Any help would be immensely appreciated. Thanks!

1
  • Wait, what you showed as the output you want looks like a list. So if not a list, what are you trying to get? Commented Jan 18, 2020 at 14:46

2 Answers 2

1

I am not exactly clear about your expected output but if you want to create dummies for each month one way with tidyverse would be to extract month start and end dates, create a dummy column and get the data in wide format.

library(tidyverse)

df %>%
  mutate_at(vars(ends_with("date")), as.Date, format = "%m/%d/%Y") %>%
  mutate(month = map2(Start_date, End_date,
                     ~as.integer(format(seq(.x, .y, by = "month"), "%m")))) %>%
  unnest(cols = month) %>%
  mutate(temp = 1) %>%
  pivot_wider(names_from = month, values_from = temp, 
             values_fill = list(temp = 0)) %>%
  select(names(df), as.character(1:12))

# A tibble: 3 x 17
#    Lon   Lat  Year Start_date End_date     `1`   `2`   `3`   `4`   `5`
#  <dbl> <dbl> <int> <date>     <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
#1  70.2  40.2  2000 2009-10-01 2010-04-30     1     1     1     1     0
#2  70.8  40.2  2000 2010-05-01 2010-08-30     0     0     0     0     1
#3  71    40.2  2000 2010-07-07 2010-11-30     0     0     0     0     0
# … with 7 more variables: `6` <dbl>, `7` <dbl>, `8` <dbl>, `9` <dbl>,
#   `10` <dbl>, `11` <dbl>, `12` <dbl>

data

df <- structure(list(Lon = c(70.25, 70.75, 71), Lat = c(40.25, 40.25, 
40.25), Year = c(2000L, 2000L, 2000L), Start_date = structure(c(3L, 
1L, 2L), .Label = c("05/01/2010", "07/07/2010", "10/01/2009"), class = "factor"), 
End_date = structure(1:3, .Label = c("04/30/2010", "08/30/2010", 
"11/30/2010"), class = "factor")), class = "data.frame", row.names = c(NA,-3L))
Sign up to request clarification or add additional context in comments.

2 Comments

I have a similar problem here [stackoverflow.com/questions/62679046/…. Could you help ?
@Belle Looks like you deleted the question.
0

We can use spread from tidyr which would also work if the tidyr version is not the current one

library(dplyr)
library(tidyr)
df %>%
   mutate_at(vars(ends_with("date")), as.Date, format = "%m/%d/%Y") %>%
   mutate(month = map2(Start_date, End_date,
                      ~as.integer(format(seq(.x, .y, by = "month"), "%m")))) %>%
   unnest(cols = month) %>%
   mutate(temp = 1) %>% 
   spread(month, temp, fill = 0)

data

df <- structure(list(Lon = c(70.25, 70.75, 71), Lat = c(40.25, 40.25, 
40.25), Year = c(2000L, 2000L, 2000L), Start_date = structure(c(3L, 
1L, 2L), .Label = c("05/01/2010", "07/07/2010", "10/01/2009"), class = "factor"), 
End_date = structure(1:3, .Label = c("04/30/2010", "08/30/2010", 
"11/30/2010"), class = "factor")), class = "data.frame", row.names = c(NA,-3L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.