0

I am trying to process by group data using dplyr but it is not working. Any help would be appreciated. Below is a sample of the data. I want to retain the value for year 2014 and calculate rest of the values for midfs1 using the lag(midfs1) and value. Below is my attempt at the problem.

 t3 = t2 %>% 
 group_by(cz,btype) %>%
 mutate( midfs1 = ifelse(year == 2014,midfs1,
 lag(midfs1)*value+lag(midfs1)))

t2 data:

   cz    btype       year   midfs    value      midfs1
1    College     2014   5.4254  0.007582767 5.4254
1    College     2015   5.4779  0.007582767 NA
1    College     2016   5.5191  0.007582767 NA
1    College     2017   5.5616  0.007582767 NA
1    College     2018   5.6097  0.007582767 NA
1    Grocery     2012   4.8267  0.002697526 NA
1    Grocery     2013   4.8205  0.002697526 NA
1    Grocery     2014   4.8583  0.002697526 4.8583
1    Grocery     2015   4.8966  0.002697526 NA
1    Grocery     2016   4.9556  0.002697526 NA
1    Grocery     2017   5.0258  0.002697526 NA
1    Grocery     2018   5.0982  0.002697526 NA
1    Grocery     2019   5.1514  0.002697526 NA
1    Grocery     2020   5.1976  0.002697526 NA
1    Grocery     2021   5.2338  0.002697526 NA
4
  • Do you want by all means to use dplyr? If not, data.table helps better. However, what is exactly that you want to obtain ("retain the value and calculate the rest" are not too much explicative)? Commented Oct 28, 2015 at 22:36
  • so, for year 2015 and on, for variable midfs1, I want to perform the following calculation ( .0075)*5.4254+5.4254 and continue that down by group. I hope this helps. Commented Oct 28, 2015 at 22:41
  • Not sure exactly what you really want to do with your grouping, but try this : t2 %>% group_by(cz,btype) %>% mutate( midfs1 = ifelse(year == 2014,midfs, lag(midfs)*value+lag(midfs))) Commented Oct 28, 2015 at 22:57
  • tried it but no luck, the grouping is to tag the year 2014 for which I have data for different groups. So, all I want to do is apply a growth factor (value) to the midfs1 number and recursively build on that. Commented Oct 28, 2015 at 23:49

1 Answer 1

0

Solution to the compounding growth problem:

t3 <-
  t2 %>% 
  group_by(cz, btype) %>%
  filter(year >= 2014) %>% 
  mutate(my_n = 1:n(),
         midfs2 = ifelse(year == 2014,
                         midfs1,
                         rep(midfs1[1]) * (1 + value) ^ lag(my_n, 1)))

# result

Source: local data frame [13 x 8]
Groups: cz, btype

   cz   btype year  midfs       value midfs1 my_n   midfs2
1   1 College 2014 5.4254 0.007582767 5.4254    1 5.425400
2   1 College 2015 5.4779 0.007582767     NA    2 5.466540
3   1 College 2016 5.5191 0.007582767     NA    3 5.507991
4   1 College 2017 5.5616 0.007582767     NA    4 5.549757
5   1 College 2018 5.6097 0.007582767     NA    5 5.591839
6   1 Grocery 2014 4.8583 0.002697526 4.8583    1 4.858300
7   1 Grocery 2015 4.8966 0.002697526     NA    2 4.871405
8   1 Grocery 2016 4.9556 0.002697526     NA    3 4.884546
9   1 Grocery 2017 5.0258 0.002697526     NA    4 4.897722
10  1 Grocery 2018 5.0982 0.002697526     NA    5 4.910934
11  1 Grocery 2019 5.1514 0.002697526     NA    6 4.924181
12  1 Grocery 2020 5.1976 0.002697526     NA    7 4.937465
13  1 Grocery 2021 5.2338 0.002697526     NA    8 4.950783
Sign up to request clarification or add additional context in comments.

7 Comments

Sorry Miha, I did not look at the answer carefully. Your program performs the following calculation for 2016 (.007582767*2*5.4254+5.4254 = 5.5076790881636 but the correct calculation if I follow your way of thinking should be (.007582767^2+.007582767*2+1)*5.4254 = 5.50799103974085851834 . I did not notice this because the answer is close.
I edited the answer, I added the cumprod term as well.
I don't think you can do that Miha, because the pattern is more complicated than just the cumulative product and sum. I know how to accomplish what I need using loops but I wanted to see if it was possible to do without using loops. I have a feeling this might require some esoteric programming clue to accomplish.
Have you looked at the results? Are they still wrong? For 2016, the result is identical to what you calculated in previous comment.
yes, here is a subset of the results from using loops for years 2015-2020. 5.466539544 5.50799104 5.549756853 5.591839366 5.634240982 5.676964118
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.