1

I am working with R. i have a list of datasets where each of those sets should have a row length 5 for each month (Jan-May). it should look like this:

data.frame(name = rep("B", 5), 
           doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
           i_name = rep("Aa",5), 
           aggregation = rep("34"), 5)

but some of my datasets dont have data for certain months, or are completely empty, and therefore have a shorter row length/no rows at all. like this:

data.frame(name = "A", 
           doc_month = "2022.01", 
           i_name = "Aa", 
           aggregation = "34")

I would like to extend each dataset, even empty ones, with the specific months , copy all the other information into the row and put a 0 for aggregation.

I tried to use extend and complete by tidyr but couldnt make it work.

2 Answers 2

1

With tidyr's complete with purrr's reduce to add more dataframes.

Also tweaked aggregation = rep(34, 5).

library(tidyverse)

df1 <- data.frame(name = rep("B", 5), 
                  doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
                  i_name = rep("Aa",5), 
                  aggregation = rep(34, 5))

df2 <- data.frame(name = "A", 
                  doc_month = "2022.01", 
                  i_name = "Aa", 
                  aggregation = 34)

reduce(list(df1, df2, df1), bind_rows) |> 
  complete(doc_month, nesting(name, i_name), fill = list(aggregation = 0))
#> # A tibble: 15 × 4
#>    doc_month name  i_name aggregation
#>    <chr>     <chr> <chr>        <dbl>
#>  1 2022.01   A     Aa              34
#>  2 2022.01   B     Aa              34
#>  3 2022.01   B     Aa              34
#>  4 2022.02   A     Aa               0
#>  5 2022.02   B     Aa              34
#>  6 2022.02   B     Aa              34
#>  7 2022.03   A     Aa               0
#>  8 2022.03   B     Aa              34
#>  9 2022.03   B     Aa              34
#> 10 2022.04   A     Aa               0
#> 11 2022.04   B     Aa              34
#> 12 2022.04   B     Aa              34
#> 13 2022.05   A     Aa               0
#> 14 2022.05   B     Aa              34
#> 15 2022.05   B     Aa              34

Created on 2022-06-10 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Unfortunately, this is not very practical becuase I have a list of 72 datasets...so I cant join all of them...unless there is a way?
Yes, use reduce. See update above.
0

You could create a skeleton dataset with the five months and then join it to each of your partial datasets.

library(dplyr)
library(tidyr)

data_A <- data.frame(name = "A", 
                     doc_month = "2022.01", 
                     i_name = "Aa", 
                     aggregation = "34")

reference <- data.frame(doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"))

data_A |>
        full_join(reference, by = "doc_month") |> 
        mutate(aggregation = replace_na(aggregation, "0")) |>
        fill(name, i_name)

Output:

#>   name doc_month i_name aggregation
#> 1    A   2022.01     Aa          34
#> 2    A   2022.02     Aa           0
#> 3    A   2022.03     Aa           0
#> 4    A   2022.04     Aa           0
#> 5    A   2022.05     Aa           0

Created on 2022-06-10 by the reprex package (v2.0.1)

2 Comments

Thank you! However, I dont understand how to then draw the basic information like name etc to the new rows - I only get NAs.
You can use tidyr::fill() for that. I've updated my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.