I am attempting to do some panel analysis using lagged, leading and differenced variables. However the plm functions do not provide the desired results as it does not loop over individuals. I have looked online, however the following post (Answer_Stack), using pdata.frame() gave the same problematic results. When i group_by(i) in dplyr i get the desired result. Can anyone explain what is going on?
# Variables
i <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7)
t <- c(2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003)
y <- c(0.047136, 0.044581, 0.040973, 0.045536, 0.043952, 0.038797, 0.049942, 0.047440, 0.042193, 0.048503, 0.046816, 0.040292, 0.056089, 0.052054, 0.047078, 0.044223, 0.041516, 0.036947, 0.045608, 0.042028, 0.037878)
x <- c(0.32691, 0.33013, 0.32888, 0.40301, 0.40337, 0.40326, 0.29692, 0.29982, 0.29790, 0.30380, 0.30698, 0.30668, 0.27942, 0.28696, 0.28616, 0.31218, 0.31424, 0.31382, 0.34592, 0.34738, 0.34782)
# Create plm dataframe
dta <- data.frame(i, t, y, x)
pdta <- plm.data(dta, indexes = c("i", "t"))
# Create lagged variable with plm
pdta$l.x <- lag(pdta$x) # Does not work
# Create using dplyr
pdta <- pdta %>%
group_by(i) %>%
mutate(lag.x = lag(x))
View(pdta)
Note to answer: Even after following the steps suggested, i get this:
> pdta <- pdata.frame(dta, index= c("i", "t"))
> head(cbind(pdta$i, pdta$y, lag(pdta$y)), 10)
[,1] [,2] [,3]
1-2001 1 0.047136 NA
1-2002 1 0.044581 0.047136
1-2003 1 0.040973 0.044581
2-2001 2 0.045536 0.040973
2-2002 2 0.043952 0.045536
2-2003 2 0.038797 0.043952
3-2001 3 0.049942 0.038797
3-2002 3 0.047440 0.049942
3-2003 3 0.042193 0.047440
4-2001 4 0.048503 0.042193
plm, so I have the latest version. If you have the latest version, second possibility is that you loadeddplyrafter loadingplm.dplyrhas it's ownlagfunction, which overwrites base R'slagfunction.plmuses base R's lag function, so loadingdplyrdestroys this functionality.