adding rows in datasets for missing values with R

Question

I am working with R. i have a list of datasets where each of those sets should have a row length 5 for each month (Jan-May). it should look like this:

data.frame(name = rep("B", 5), 
           doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
           i_name = rep("Aa",5), 
           aggregation = rep("34"), 5)

but some of my datasets dont have data for certain months, or are completely empty, and therefore have a shorter row length/no rows at all. like this:

data.frame(name = "A", 
           doc_month = "2022.01", 
           i_name = "Aa", 
           aggregation = "34")

I would like to extend each dataset, even empty ones, with the specific months , copy all the other information into the row and put a 0 for aggregation.

I tried to use extend and complete by tidyr but couldnt make it work.

Carl · Accepted Answer · 2022-06-10 08:11:20Z

1

With tidyr's complete with purrr's reduce to add more dataframes.

Also tweaked aggregation = rep(34, 5).

library(tidyverse)

df1 <- data.frame(name = rep("B", 5), 
                  doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
                  i_name = rep("Aa",5), 
                  aggregation = rep(34, 5))

df2 <- data.frame(name = "A", 
                  doc_month = "2022.01", 
                  i_name = "Aa", 
                  aggregation = 34)

reduce(list(df1, df2, df1), bind_rows) |> 
  complete(doc_month, nesting(name, i_name), fill = list(aggregation = 0))
#> # A tibble: 15 × 4
#>    doc_month name  i_name aggregation
#>    <chr>     <chr> <chr>        <dbl>
#>  1 2022.01   A     Aa              34
#>  2 2022.01   B     Aa              34
#>  3 2022.01   B     Aa              34
#>  4 2022.02   A     Aa               0
#>  5 2022.02   B     Aa              34
#>  6 2022.02   B     Aa              34
#>  7 2022.03   A     Aa               0
#>  8 2022.03   B     Aa              34
#>  9 2022.03   B     Aa              34
#> 10 2022.04   A     Aa               0
#> 11 2022.04   B     Aa              34
#> 12 2022.04   B     Aa              34
#> 13 2022.05   A     Aa               0
#> 14 2022.05   B     Aa              34
#> 15 2022.05   B     Aa              34

^{Created on 2022-06-10 by the reprex package (v2.0.1)}

edited Jun 10, 2022 at 8:11

answered Jun 9, 2022 at 13:42

Carl

7,5903 gold badges15 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Kata Over a year ago

Thank you! Unfortunately, this is not very practical becuase I have a list of 72 datasets...so I cant join all of them...unless there is a way?

Carl Over a year ago

Yes, use reduce. See update above.

Andrea M · Accepted Answer · 2022-06-10 08:10:22Z

0

You could create a skeleton dataset with the five months and then join it to each of your partial datasets.

library(dplyr)
library(tidyr)

data_A <- data.frame(name = "A", 
                     doc_month = "2022.01", 
                     i_name = "Aa", 
                     aggregation = "34")

reference <- data.frame(doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"))

data_A |>
        full_join(reference, by = "doc_month") |> 
        mutate(aggregation = replace_na(aggregation, "0")) |>
        fill(name, i_name)

Output:

#>   name doc_month i_name aggregation
#> 1    A   2022.01     Aa          34
#> 2    A   2022.02     Aa           0
#> 3    A   2022.03     Aa           0
#> 4    A   2022.04     Aa           0
#> 5    A   2022.05     Aa           0

^{Created on 2022-06-10 by the reprex package (v2.0.1)}

edited Jun 10, 2022 at 8:10

answered Jun 9, 2022 at 11:31

Andrea M

2,5711 gold badge13 silver badges35 bronze badges

2 Comments

Kata Over a year ago

Thank you! However, I dont understand how to then draw the basic information like name etc to the new rows - I only get NAs.

Andrea M Over a year ago

You can use tidyr::fill() for that. I've updated my answer.

Collectives™ on Stack Overflow

adding rows in datasets for missing values with R

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related