Computing Variables in R from Multiple Values in the Same Variable

Question

Imagine we have a data set called df, and that this data set is composed of two variables called year and x1:

year <- c(2000, 2001, 2002, 2003, 2004)
x1 <- c(7, 8, 6, 3, 3)
df <- data.frame(year, x1)

My task is to compute two new variables out of x1. The first variable is cSum, which must reflect the sum of the values of x1 for the last two years. The second variable is cMax, which must reflect the highest values for x1 in the last three years.

The outcome should be as follows:

year  x1  cSum  cMax
2000   7     
2001   8    15     
2002   6    14     8
2003   3     9     8
2004   3     6     6

How can I compute the cSum and cMax variables above?

Thanks!

MichaelChirico · Accepted Answer · 2016-07-20 23:58:31Z

3

Using data.table:

library(data.table)
setDT(df)

First, an convoluted way; since transpose is optimized, this may be faster (untested):

df[ , cSum := transpose(lapply(transpose(shift(x1, 0:1)), sum))]
df[ , cMax := transpose(lapply(transpose(shift(x1, 0:2)), max))]

shift is essentially a lag operator; we want lags 0, 1, and (for cMax) 2 to get the current and prior 1 (or 2) periods.

Alternatively:

df[ , cSum := rowSums(do.call(cbind, shift(x1, 0:1)))]
df[ , cMax := do.call(pmax, shift(x1, 0:2))]

Both give the same output:

df
#    year x1 cSum cMax
# 1: 2000  7   NA   NA
# 2: 2001  8   15   NA
# 3: 2002  6   14    8
# 4: 2003  3    9    8
# 5: 2004  3    6    6

The thing making this messy is that when shift returns more than one lag, it returns a list; but unfortunately this list is the transpose of what we need (we're doing a row-wise operation, and it's produced in a column-friendly way). The first option transposes the list to get it in a more manageable form, then does the row-wise operation before transposeing back into the columnar form.

The second option converts the output to an array and does row-wise operations on the array.

edited Jul 20, 2016 at 23:58

answered Jul 20, 2016 at 23:33

MichaelChirico

34.9k17 gold badges122 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Bg1850 Over a year ago

is transpose needed , this would achieve the same df[,cSum:=(shift(x1,1,"lag")+shift(x1,2,"lag"))]

MichaelChirico Over a year ago

@Bg1850 I was actually going to add that, thanks for pointing it out. that approach is not very extensible (summing 10 periods, e.g.), but is certainly more pleasant in this case.

neutral Over a year ago

Thanks! One more thing, if possible: How should I edit the code if I want to do this without the lag? That is, in a way that should result in the NA values going to the bottom of the column, rather than the top?

MichaelChirico Over a year ago

you mean leading instead of lagging? simply negate the indices.

neutral Over a year ago

I tried. But negative numbers (0:1) return an error.

|

Jacob H · Accepted Answer · 2016-07-21 00:54:23Z

0

Here is an approach utilizing a lag operator. Essentially I'm augmenting your data so as to minimize the need of for loops. In doing so, I'm increasing the amount of memory utilized. This approach may make sense if you are going to be doing more time series analysis with this data set. In the answer I utilize the zoo package, which is my favorite time series package. However, there are many others ts, xts (which is generally faster than zoo),...

library(zoo)

year <- c(2000, 2001, 2002, 2003, 2004, 2005)
x1 <- c(7, 8, 6, 3, 3, 6)
df <- data.frame(year, x1)

dfZ <- zoo(df[,-1], order.by = df[,1]) 

dfZ <- merge(dfZ, lag(dfZ, seq(-1, -2)))

names(dfZ) <- paste0("L", seq(0,2))

dfZ$cSum <- rowSums(dfZ[, c("L0", "L1")])
dfZ$cMax <- apply(dfZ[, c("L0", "L1", "L2")], 1, max)

edited Jul 21, 2016 at 0:54

answered Jul 20, 2016 at 23:47

Jacob H

4,5333 gold badges34 silver badges41 bronze badges

Collectives™ on Stack Overflow

Computing Variables in R from Multiple Values in the Same Variable

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related