0

I would appreciate a hint which command to use for the following: I want to compute population estimates for the cities in column "Name" and for every year in column "Year". The column "growth" provides the growth rate. So as formula it would be like:

Population[Lucknow,2030] = Population[Lucknow, 2020] * growth[2030]

and so on. Following df:

df <- data.frame(YEAR=c(2020,2020,2020,2030,2040,2050), NAME=c("Lucknow","Delhi","Hyderadabad",NA,NA,NA), POPULATION=c(3704, 29274,10275,NA,NA,NA), growth=c(1.0,1.0,1.0,1.10,1.18,1.24))
Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030                <NA>                   NA   <NA> 1.10
2040                <NA>                   NA   <NA> 1.18
2050                <NA>                   NA   <NA> 1.24

edit: Following what Dom (thank you!) wrote below, the input would be:

df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city =rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )
    year city          pop growth
   <dbl> <chr>       <dbl>  <dbl>
 1  2020 Lucknow      3704   1   
 2  2020 Delhi       29274   1   
 3  2020 Hyderadabad 10275   1   
 4  2030 Lucknow        NA   1.1 
 5  2030 Delhi          NA   1.1 
 6  2030 Hyderadabad    NA   1.1 
 7  2040 Lucknow        NA   1.18
 8  2040 Delhi          NA   1.18
 9  2040 Hyderadabad    NA   1.18
10  2050 Lucknow        NA   1.24
11  2050 Delhi          NA   1.24
12  2050 Hyderadabad    NA   1.24

The output should look like:

Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030             Lucknow               4074.4     1.1000000
2030               Delhi              32201.4     1.1000000
2030           Hyderabad              11302.5     1.1000000
....

How to fill the NAs in the tibble?

I had various attempts with merge and dplyr::mutate, but failed to identify what I need to do here given that this is a vector operation. I'd be happy for any leads towards the correct command to do such a basic operation.

Thanks!

3
  • 1
    I need to go to bed but this is what your data should look like: df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city = rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) ) Commented Dec 7, 2018 at 11:30
  • I think this is going to be a simple group_by operation but I haven't been able to find the simple solution. Commented Dec 7, 2018 at 11:31
  • Thank you, Dom. I have included your tibble suggestion in the original post. How do I fill the NA values then? (My original data set is much larger) Commented Dec 7, 2018 at 12:00

3 Answers 3

2

Using dplyr:

library(dplyr)
df %>%
  arrange(city, year) %>%
  group_by(city) %>%
  mutate(pop = pop[1] * growth)

# A tibble: 12 x 4
# Groups:   city [3]
    year city           pop growth
   <dbl> <chr>        <dbl>  <dbl>
 1  2020 Delhi       29274    1   
 2  2030 Delhi       32201.   1.1 
 3  2040 Delhi       34543.   1.18
 4  2050 Delhi       36300.   1.24
 5  2020 Hyderadabad 10275    1   
 6  2030 Hyderadabad 11303.   1.1 
 7  2040 Hyderadabad 12124.   1.18
 8  2050 Hyderadabad 12741    1.24
 9  2020 Lucknow      3704    1   
10  2030 Lucknow      4074.   1.1 
11  2040 Lucknow      4371.   1.18
12  2050 Lucknow      4593.   1.24

Using base R:

df <- df[order(df[["city"]], df[["year"]]), ]
df[["pop"]] <-
  unlist(
    lapply(
      unique(df[["city"]]), 
      function(x) with(df[df[["city"]] == x, ], pop[1] * growth)
    )
  )

Using data.table:

library(data.table)
setDT(df)[order(city, year), pop := pop[1] * growth, city]

Data:

df <- tibble(
  year   = rep(c(2020, 2030, 2040, 2050), each = 3), 
  city   = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
  pop    = c(3704, 29274, 10275, rep(NA, times = 9)), 
  growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3)
)
Sign up to request clarification or add additional context in comments.

1 Comment

Amazing, thank you so much. This is answer is nearly encyclopedic. I will transfer these answers to my original data and try the dplyr solution lined out, very similar to the other solutions pasted.
1

Is the basis year always 2020? If yes, the following works:

library(tidyverse)

df <- tibble( year = rep(c(2020, 2030, 2040, 2050), each = 3), 
              city = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
              pop = c(3704, 29274, 10275, rep(NA_integer_, times = 9)), 
              growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )

uniq <- unique(df$pop)
uniq <- uniq[!is.na(uniq)]

df$pop <- rep(uniq, length(unique(df$year)))

df <- df %>% 
  mutate(pop2 = pop * growth)

1 Comment

Thank you! Seeing how to use mutate in here is good, an alternative to join.
0
library(tidyverse)
NAME <- c("Lucknow","Delhi","Hyderadabad")
YEAR <- seq(2020,2050,10)
POPULATION=rep(c(3704, 29274,10275),4)
pop_df <- bind_cols(expand.grid(Name=NAME,Year=YEAR),Population=POPULATION)
growth_df <- data.frame(Year=seq(2020,2050,10),growth=c(1,1.1,1.18,1.23))
pop_df <- left_join(pop_df,growth_df) %>%
  mutate(Population=round(Population*growth))

1 Comment

Thank you! This works well, I can see that the main thing I was missing is the correct form of join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.