Compute new rows in r data frame r based on existing rows and columns

Question

I would appreciate a hint which command to use for the following: I want to compute population estimates for the cities in column "Name" and for every year in column "Year". The column "growth" provides the growth rate. So as formula it would be like:

Population[Lucknow,2030] = Population[Lucknow, 2020] * growth[2030]

and so on. Following df:

df <- data.frame(YEAR=c(2020,2020,2020,2030,2040,2050), NAME=c("Lucknow","Delhi","Hyderadabad",NA,NA,NA), POPULATION=c(3704, 29274,10275,NA,NA,NA), growth=c(1.0,1.0,1.0,1.10,1.18,1.24))
Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030                <NA>                   NA   <NA> 1.10
2040                <NA>                   NA   <NA> 1.18
2050                <NA>                   NA   <NA> 1.24

edit: Following what Dom (thank you!) wrote below, the input would be:

df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city =rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )
    year city          pop growth
   <dbl> <chr>       <dbl>  <dbl>
 1  2020 Lucknow      3704   1   
 2  2020 Delhi       29274   1   
 3  2020 Hyderadabad 10275   1   
 4  2030 Lucknow        NA   1.1 
 5  2030 Delhi          NA   1.1 
 6  2030 Hyderadabad    NA   1.1 
 7  2040 Lucknow        NA   1.18
 8  2040 Delhi          NA   1.18
 9  2040 Hyderadabad    NA   1.18
10  2050 Lucknow        NA   1.24
11  2050 Delhi          NA   1.24
12  2050 Hyderadabad    NA   1.24

The output should look like:

Year                Name           Population        growth
2020             Lucknow                 3704     1.0000000
2020               Delhi                29274     1.0000000
2020           Hyderabad                10275     1.0000000
2030             Lucknow               4074.4     1.1000000
2030               Delhi              32201.4     1.1000000
2030           Hyderabad              11302.5     1.1000000
....

How to fill the NAs in the tibble?

I had various attempts with merge and dplyr::mutate, but failed to identify what I need to do here given that this is a vector operation. I'd be happy for any leads towards the correct command to do such a basic operation.

Thanks!

I need to go to bed but this is what your data should look like: df <- tibble( year = rep(c(2020,2030,2040,2050), each = 3), city = rep(c("Lucknow","Delhi","Hyderadabad"), times = 4), pop = c(3704, 29274,10275, rep(NA_integer_, times = 9)), growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) ) — Dom
– Dom, Commented Dec 7, 2018 at 11:30
I think this is going to be a simple group_by operation but I haven't been able to find the simple solution. — Dom
– Dom, Commented Dec 7, 2018 at 11:31
Thank you, Dom. I have included your tibble suggestion in the original post. How do I fill the NA values then? (My original data set is much larger) — Steffen
– Steffen, Commented Dec 7, 2018 at 12:00

s_baldur · Accepted Answer · 2018-12-07 12:49:14Z

2

Using dplyr:

library(dplyr)
df %>%
  arrange(city, year) %>%
  group_by(city) %>%
  mutate(pop = pop[1] * growth)

# A tibble: 12 x 4
# Groups:   city [3]
    year city           pop growth
   <dbl> <chr>        <dbl>  <dbl>
 1  2020 Delhi       29274    1   
 2  2030 Delhi       32201.   1.1 
 3  2040 Delhi       34543.   1.18
 4  2050 Delhi       36300.   1.24
 5  2020 Hyderadabad 10275    1   
 6  2030 Hyderadabad 11303.   1.1 
 7  2040 Hyderadabad 12124.   1.18
 8  2050 Hyderadabad 12741    1.24
 9  2020 Lucknow      3704    1   
10  2030 Lucknow      4074.   1.1 
11  2040 Lucknow      4371.   1.18
12  2050 Lucknow      4593.   1.24

Using base R:

df <- df[order(df[["city"]], df[["year"]]), ]
df[["pop"]] <-
  unlist(
    lapply(
      unique(df[["city"]]), 
      function(x) with(df[df[["city"]] == x, ], pop[1] * growth)
    )
  )

Using data.table:

library(data.table)
setDT(df)[order(city, year), pop := pop[1] * growth, city]

Data:

df <- tibble(
  year   = rep(c(2020, 2030, 2040, 2050), each = 3), 
  city   = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
  pop    = c(3704, 29274, 10275, rep(NA, times = 9)), 
  growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3)
)

edited Dec 7, 2018 at 12:49

answered Dec 7, 2018 at 12:43

s_baldur

34.6k4 gold badges43 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steffen Over a year ago

Amazing, thank you so much. This is answer is nearly encyclopedic. I will transfer these answers to my original data and try the dplyr solution lined out, very similar to the other solutions pasted.

huan · Accepted Answer · 2018-12-07 12:28:31Z

1

Is the basis year always 2020? If yes, the following works:

library(tidyverse)

df <- tibble( year = rep(c(2020, 2030, 2040, 2050), each = 3), 
              city = rep(c("Lucknow", "Delhi", "Hyderadabad"), times = 4), 
              pop = c(3704, 29274, 10275, rep(NA_integer_, times = 9)), 
              growth = rep(c(1.0, 1.10, 1.18, 1.24), each = 3) )

uniq <- unique(df$pop)
uniq <- uniq[!is.na(uniq)]

df$pop <- rep(uniq, length(unique(df$year)))

df <- df %>% 
  mutate(pop2 = pop * growth)

answered Dec 7, 2018 at 12:28

huan

3083 silver badges16 bronze badges

1 Comment

Steffen Over a year ago

Thank you! Seeing how to use mutate in here is good, an alternative to join.

Erich Neuwirth · Accepted Answer · 2018-12-07 12:03:23Z

0

library(tidyverse)
NAME <- c("Lucknow","Delhi","Hyderadabad")
YEAR <- seq(2020,2050,10)
POPULATION=rep(c(3704, 29274,10275),4)
pop_df <- bind_cols(expand.grid(Name=NAME,Year=YEAR),Population=POPULATION)
growth_df <- data.frame(Year=seq(2020,2050,10),growth=c(1,1.1,1.18,1.23))
pop_df <- left_join(pop_df,growth_df) %>%
  mutate(Population=round(Population*growth))

answered Dec 7, 2018 at 12:03

Erich Neuwirth

1,03910 silver badges18 bronze badges

1 Comment

Steffen Over a year ago

Thank you! This works well, I can see that the main thing I was missing is the correct form of join.

Collectives™ on Stack Overflow

Compute new rows in r data frame r based on existing rows and columns

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related