Sequential generation of columns based on multiple existing columns

Question

I have a data frame that looks like this:

 df <- data.frame(project = c("A", "B"),
                  no_dwellings = c(150, 180),
                  first_occupancy = c(2020, 2019))

  project no_dwellings first_occupancy
1       A          150            2020
2       B          180            2019

project is a column identifying residential building areas, no_dwellings denotes how many dwellings are ultimately being constructed in these areas, and first_occupancy is an estimate as to when the first residents will start moving into the newly constructed apartments.

I need to incorporate this information into a population forecast. The best estimate we have is that in each year (starting with first occupancy), 60 of the dwellings are being moved into. Thus, I need to sequentially generate columns that combine the information from first_occupancy and no_dwellings to indicate for each year how many dwellings are likely to be moved into. Since the number of constructed dwellings does not necessarily divide by 60, the remainder needs to be put into the last column for the respective project.

This is what I expect my data frame to look like for further processing:

  project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1       A          150            2020         0        60        60        30
2       B          180            2019        60        60        60         0

Jaap · Accepted Answer · 2018-03-05 09:15:40Z

Using the data.table-package you could approach this as follows:

library(data.table)

setDT(df)[, .(yr = first_occupancy:(first_occupancy + no_dwellings %/% 60),
              dw = c(rep(60, no_dwellings %/% 60), no_dwellings %% 60))
          , by = .(project, no_dwellings, first_occupancy)
          ][, dcast(.SD, project + no_dwellings + first_occupancy ~ paste0('year_',yr), value.var = 'dw', fill = 0)]

which gives:

   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1:       A          150            2020         0        60        60        30
2:       B          180            2019        60        60        60         0

The same logic applied with the tidyverse:

library(dplyr)
library(tidyr)

df %>% 
  group_by(project) %>% 
  do(data.frame(no_dwellings = .$no_dwellings, first_occupancy = .$first_occupancy,
                yr = paste0('year_',.$first_occupancy:(.$first_occupancy + .$no_dwellings %/% 60)),
                dw = c(rep(60, .$no_dwellings %/% 60), .$no_dwellings %% 60))) %>% 
  spread(yr, dw, fill = 0)

fauxneticien · Accepted Answer · 2018-03-05 09:15:30Z

It's quite straightforward to generate a long data frame of what you require, which we can do with make_pop_df. All you have to do then is to use the function in a mutate call, storing the resulting data frame in the very handy 'list columns', which tibbles allow, use unnest to get the data frames out of the list column, then tidyr::spread to display the data in a wide format.

library(tidyverse)

make_pop_df <- function(no_dwellings, first_year, decay = -60) {
    seq(from = no_dwellings, to = 0, by = decay) %>%
    tibble(pop  = ., year = first_year + 1:length(.) - 1
    )
}

df %>%
    group_by(project) %>% 
    mutate(pop_df = list(make_pop_df(no_dwellings, first_occupancy))) %>% 
    unnest(pop_df) %>%
    spread(key = year, value = pop)

www · Accepted Answer · 2018-03-05 09:23:13Z

Another tidyverse solution using the complete function to create all years and then filling the numbers.

library(dplyr)
library(tidyr)

df2 <- df %>%
  mutate(year = first_occupancy) %>%
  group_by(project) %>%
  complete(nesting(no_dwellings, first_occupancy), 
         year = full_seq(c(year, min(year) + unique(no_dwellings) %/% 60), period = 1)) %>%
  mutate(number = c(rep(60, unique(no_dwellings) %/% 60), unique(no_dwellings) %% 60),
         year = paste("year", year, sep = "_")) %>%
  spread(year, number, fill = 0) %>%
  ungroup()
df2
# # A tibble: 2 x 7
#   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
#   <fct>          <dbl>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 A               150.           2020.        0.       60.       60.       30.
# 2 B               180.           2019.       60.       60.       60.        0.

Collectives™ on Stack Overflow

Sequential generation of columns based on multiple existing columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related