Replacing missing values

Question

Let's say I have a dataframe containing the sales for some quarters, while the values for the following quarters are missing. I would like to replace the NAs by a simple formula (with mutate/dplyr like below). The issue is that I don't want to use mutate so many times. How could I do that for all NAs at the same time? Is there a way?

structure(list(Period = c("1999Q1", "1999Q2", "1999Q3", "1999Q4", 
"2000Q1", "2000Q2", "2000Q3", "2000Q4", "2001Q1", "2001Q2", "2001Q3", 
"2001Q4", "2002Q1", "2002Q2", "2002Q3", "2002Q4", "2003Q1", "2003Q2", 
"2003Q3", "2003Q4"), Sales= c(353.2925571, 425.9299841, 357.5204626, 
363.80247, 302.8081066, 394.328576, 435.15573, 387.99768, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-20L))

test %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales)) %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales)) %>%
      mutate(Sales = ifelse(is.na(Sales), 1.05*lag(Sales, 4), Sales))

Oh, there's no limitation to dplyr only. Any solution that may provide the desired output is acceptable. Thanks. — AlexB
– AlexB, Commented Aug 31, 2019 at 6:52
Is it fine to assume that your NA's are only at the end of the data? i.e. you won't have for example data for 1999 and 2001 but some NA in 2000? — bluk
– bluk, Commented Aug 31, 2019 at 7:23
Yes, the missing values are only at the end of the dataframe. — AlexB
– AlexB, Commented Aug 31, 2019 at 7:27

tmfmnk · Accepted Answer · 2019-08-31 07:50:32Z

One dplyr and tidyr possibility could be:

df %>%
 group_by(quarter = substr(Period, 5, 6)) %>%
 mutate(Sales_temp = replace_na(Sales, last(na.omit(Sales)))) %>%
 group_by(quarter, na = is.na(Sales)) %>%
 mutate(constant = 1.05,
        Sales_temp = Sales_temp * cumprod(constant),
        Sales = coalesce(Sales, Sales_temp)) %>%
 ungroup() %>%
 select(1:2)

   Period Sales
   <chr>  <dbl>
 1 1999Q1  353.
 2 1999Q2  426.
 3 1999Q3  358.
 4 1999Q4  364.
 5 2000Q1  303.
 6 2000Q2  394.
 7 2000Q3  435.
 8 2000Q4  388.
 9 2001Q1  318.
10 2001Q2  414.
11 2001Q3  457.
12 2001Q4  407.
13 2002Q1  334.
14 2002Q2  435.
15 2002Q3  480.
16 2002Q4  428.
17 2003Q1  351.
18 2003Q2  456.
19 2003Q3  504.
20 2003Q4  449.

Or with just dplyr:

df %>%
 group_by(quarter = substr(Period, 5, 6)) %>%
 mutate(Sales_temp = if_else(is.na(Sales), last(na.omit(Sales)), Sales)) %>%
 group_by(quarter, na = is.na(Sales)) %>%
 mutate(constant = 1.05,
        Sales_temp = Sales_temp * cumprod(constant),
        Sales = coalesce(Sales, Sales_temp)) %>%
 ungroup() %>%
 select(1:2)

bluk · Accepted Answer · 2019-08-31 08:09:29Z

1

x <- test$Sales

# find that last non-NA data
last.valid <- tail(which(!is.na(x)),1)

# store the "base"
base <- ceiling(last.valid/4)*4 + (-3:0)
base <- base + ifelse(base > last.valid, -4, 0)
base <- x[base]


# calculate the "exponents"
expos <- ceiling( ( seq(length(x)) - last.valid ) / 4 )

test$Sales <- ifelse(is.na(x), bases * 1.05 ^ expos, x)

tail(test)

#    Period    Sales
# 15 2002Q3 479.7592
# 16 2002Q4 427.7674
# 17 2003Q1 350.5382
# 18 2003Q2 456.4846
# 19 2003Q3 503.7472
# 20 2003Q4 449.1558

edited Aug 31, 2019 at 8:09

answered Aug 31, 2019 at 7:52

bluk

4107 silver badges15 bronze badges

2 Comments

AlexB Over a year ago

Thank you for this one. However, I had to accept the other reply as the solution for my request. All the best!

bluk Over a year ago

No worries, glad you find a solution :)

Cole · Accepted Answer · 2019-08-31 12:27:27Z

0

Here's another base solution:

non_nas <- na.omit(test$Sales)
nas <- length(attr(non_nas, 'na.action'))

test$Sales <- c(non_nas, #keep non_nas
                 tail(non_nas, 4) * 1.05 ^(rep(1:floor(nas / 4), each = 4, length.out = nas)))

test

answered Aug 31, 2019 at 12:27

Cole

11.3k2 gold badges11 silver badges25 bronze badges

Collectives™ on Stack Overflow

Replacing missing values

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related