Fill down a range of rows in R [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 2 years ago.

Improve this question

I have a panel dataset in R, which includes observations per group over time (month). The following dataframe is a snapshot of the complete dataframe:

df <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"),first_value = c("A","BC","D", NA,NA,NA, "D","G","H", "K","L", NA,NA, "DE","GH"),second_value = c(1,5,7, NA,NA,NA, 2,3,9, 7,1, NA,NA, 4,4))

The dataset is already arranged by group and time. As you can see, observations ("first_value*"* and *"*second_value") can be completely empty for a group in a given month (here February, but can be any month except the first and the last month for every group). What I want to achieve is that the empty months are filled with the last non-empty previous month within a group.

I want to get the following dataframe:

df_filled <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"),first_value = c("A","BC","D", "A","BC","D", "D","G","H", "K","L", "K","L", "DE","GH"),second_value = c(1,5,7, 1,5,7, 2,3,9, 7,1, 7,1, 4,4))

Please note that, by construction, the last non-empty previous month always has the same number of observations than the following empty months.

I tried different commands with fill() from the dplyr package and na.locf () from the zoo package but all I achieved was filling down the last row of the last non-empty previous month, so that

df_filled <- data.frame(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), month = c("January", "January", "January", "February", "February", "February", "March", "March", "March", "January", "January", "February", "February", "March", "March"), first_value = c("A","BC","D", "D","D","D", "D","G","H", "K","L", "L","L", "DE","GH"), second_value = c(1,5,7, 7,7,7, 2,3,9, 7,1, 1,1, 4,4))

Looking forward to your suggestions. thanks.

Andre Wildberg · Accepted Answer · 2023-08-01 15:04:36Z

0

An approach using row_number, assuming there are no 2 consecutive month with NA.

library(dplyr)

df %>% 
  mutate(n_na = sum(is.na(first_value)), .by = c(group, month)) %>% 
  mutate(across(ends_with("_value"), ~ 
           if_else(is.na(.x), .x[row_number() - n_na], .x)), .by = group, 
         n_na = NULL)
   group    month first_value second_value
1      1  January           A            1
2      1  January          BC            5
3      1  January           D            7
4      1 February           A            1
5      1 February          BC            5
6      1 February           D            7
7      1    March           D            2
8      1    March           G            3
9      1    March           H            9
10     2  January           K            7
11     2  January           L            1
12     2 February           K            7
13     2 February           L            1
14     2    March          DE            4
15     2    March          GH            4

If consecutive NA months are possible with this approach it takes a bit of group juggling

df %>% 
  mutate(n_na = sum(is.na(first_value)), .by = c(group, month)) %>% 
  mutate(n_lag = lag(n_na, default=0), .by = group) %>% 
  mutate(n_lag = n_na + first(n_lag), 
         n_na = if_else(n_na != 0, n_lag, n_na), 
         n_lag = NULL, .by = c(group, month)) %>% 
  mutate(across(ends_with("_value"), ~ 
           if_else(is.na(.x), .x[row_number() - n_na], .x)), .by = group, 
         n_na = NULL)

edited Aug 1, 2023 at 15:04

answered Aug 1, 2023 at 14:33

Andre Wildberg

19.9k4 gold badges20 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tobibow Over a year ago

thanks for your answer. Maybe I should have been more precise in my description, there are 2 and more consecutive month with only NAs. Moreover, when running your code, I get the following error message: Error in mutate(): ! Problem while computing .by = c(group, month). x .by must be size 15 or 1, not 30. Run rlang::last_error() to see where the error occurred.

Andre Wildberg Over a year ago

@tobibow Regarding the error it may be you're running dplyr < 1.1.0. Use %>% group_by(group, month) %>% ... %>% ungroup() %>% ... in this case, or simply upgrade. 2 or more NA months should work with the second code.

Collectives™ on Stack Overflow

Fill down a range of rows in R [closed]

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related