2

I have the following data-frame

Year    Category      TotalSales    AverageCount
1   2013    Beverages      102074.29    22190.06
2   2013    Condiments      55277.56    14173.73
3   2013    Confections     36415.75    12138.58
4   2013    Dairy Products  30337.39    24400.00
5   2013    Seafood         53019.98    27905.25
6   2014    Beverages       81338.06    35400.00
7   2014    Condiments      55948.82    19981.72
8   2014    Confections     44478.36    24710.00
9   2014    Dairy Products  84412.36    32466.00
10  2014    Seafood         65544.19    14565.37

I calculated the cumulative sum for TotalSales, grouped by Year by the following method

dat <-within(dat, {
  RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
}) 

and the output is this,

    Year    Category        TotalSales AverageCount RunningTotal
1   2013    Beverages       102074.29   22190.06    102074.29
2   2013    Condiments      55277.56    14173.73    157351.85
3   2013    Confections     36415.75    12138.58    193767.60
4   2013    Dairy Products  30337.39    24400.00    224104.99
5   2013    Seafood         53019.98    27905.25    277124.97
6   2014    Beverages       81338.06    35400.00    81338.06
7   2014    Condiments      55948.82    19981.72    137286.88
8   2014    Confections     44478.36    24710.00    181765.24
9   2014    Dairy Products  84412.36    32466.00    266177.60
10  2014    Seafood         65544.19    14565.37    331721.79

How do I calculate the group-wise Ratio of the elements in the row RunningTotal (Ratio between RunningTotal[i+1] and RunningTotal[i])?

I've tried using mutate from dplyr

require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)

and I get an incorrect output ( notice NAs)

    Year    Category       TotalSales AverageCount  RunningTotal Ratio
1   2013    Beverages       102074.29   22190.06    102074.29   NA
2   2013    Condiments      55277.56    14173.73    157351.85   0.6487009
3   2013    Confections     36415.75    12138.58    193767.60   0.8120648
4   2013    Dairy Products  30337.39    24400.00    224104.99   0.8646287
5   2013    Seafood         53019.98    27905.25    277124.97   0.8086784
6   2014    Beverages       81338.06    35400.00    81338.06    NA
7   2014    Condiments      55948.82    19981.72    137286.88   0.5924678
8   2014    Confections     44478.36    24710.00    181765.24   0.7552978
9   2014    Dairy Products  84412.36    32466.00    266177.60   0.6828720
10  2014    Seafood         65544.19    14565.37    331721.79   0.8024122

How do I get the desired output as shown below?

Year    Category       TotalSales AverageCount RunningTotal    Ratio
2013    Beverages       102074.29   22190.06    102074.29   1.5415424393
2013    Condiments      55277.56    14173.73    157351.85   1.2314288011
2013    Confections     36415.75    12138.58    193767.6    1.1565658552
2013    Dairy Products  30337.39    24400       224104.99   1.2365854504
2013    Seafood         53019.98    27905.25    277124.97   0.2935067887
2014    Beverages       81338.06    35400       81338.06    1.6878553533
2014    Condiments      55948.82    19981.72    137286.88   1.3239811408
2014    Confections     44478.36    24710       181765.24   1.4644032049
2014    Dairy Products  84412.36    32466       266177.6    1.2462423209
2014    Seafood         65544.19    14565.37    331721.79   0

Sample data :

dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", 
"Confections", "Dairy Products", "Seafood"), class = "factor"), 
    TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 
    81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 
    14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 
    32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", 
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
4
  • 2
    You've got the correct result just reversed. In other words, just reverse you last line as in mutate(dat, Ratio = RunningTotal/lag(RunningTotal)) Commented May 7, 2015 at 9:12
  • Well, I'm getting NAs in between.. dat$Ratio gives NA 1.541542 1.231429 1.156566 1.236585 NA 1.687855 1.323981 1.464403 1.246242. How do I avoid that? And if I write a function to divide two numbers, please let me know how do I pass it appropriately, using R's aggregate functions. Thanks in advance. Commented May 7, 2015 at 11:18
  • You should get an NA because you are using lag, but On your data I've got it only once and I had all the values like in your desired output. Commented May 7, 2015 at 11:28
  • Well.. if I write a function called divide(x,y), how do I call it using the within() function? I get an error saying object 'FUN' of mode 'function' was not found Commented May 7, 2015 at 11:33

1 Answer 1

1

The dplyr way of doing your first operation is:

dat <- dat %>% 
  group_by(Year) %>% 
  mutate(RunningTotal = cumsum(TotalSales)) %>% 
  ungroup

Then to add the ratios, use

dat %>% 
  mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))

Though I'd be tempted to make that last value NA, not 0. The ratio for 2013 Seafood (0.2935067887) doesn't make any sense either. To get rid of that, you want to not perform the ungrouping. So something like this:

dat %>% 
  group_by(Year) %>% 
  mutate(
    RunningTotal = cumsum(TotalSales),
    Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
  )
Sign up to request clarification or add additional context in comments.

3 Comments

Or just slightly modifying OPs code Ratio = c((RunningTotal/lag(RunningTotal))[-1L], NA)
@Richie yeah. pipe-lining! Helps in many ways! Cheers, Sir.. !
@DavidArenburg very well, its effective too. Thanks much, Sir!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.