1

I 'm trying to perform a pooled regression by using different subsets with the same time interval (5years) but within different years. I'm having troubles with the syntax of my code, i seem to do something wrong with the definition of the subset.

> head(Grunfeld)
  firm year   inv  value capital
1    1 1935 317.6 3078.5     2.8
2    1 1936 391.8 4661.7    52.6
3    1 1937 410.6 5387.1   156.9
4    1 1938 257.7 2792.2   209.2
5    1 1939 330.8 4313.2   203.4
6    1 1940 461.2 4643.9   207.2




library(plm)
data("Grunfeld", package="plm")

#regression
myregression <- list()
Grunfeld_sub <- data.frame()
count <- 1

#loop
for(t in 1940:1950){

Grunfeld_sub[t] <- subset(Grunfeld, year<=t & year>=t-5)
myregression[[count]] <- lm(inv~value + capital, Grunfeld_sub(t))

count<- count+1
}

what am i doing wrong with the syntax? how do I define the subsample correctly?

another problem is that if i want to use the plm package and convert my data.frame (Grunfeld) with the plm.data function, i wont be able to use subset anymore as i somewhat can not use it with factorvariables (the time variable would become a factor variable) is there a possible solution regarding this matter? Thank you for your help.

1
  • One thing I noticed: Grunfeld_sub(t) should rather be Grunfeld_sub[t] Commented Jul 13, 2014 at 14:12

1 Answer 1

1

Your code is trying to store an entire subset of Grunfeld in one column of Grunfeld_sub, which is causing the error. You don't actually need to store subsets from previous loops, because you only use the current version of Grunfeld_sub in the current iteration of the loop. You also don't need a separate count variable. Here's a reworking of your code:

# Store each subset regression in myregression
myregression <- list()

# Regression on six-year subsets of Grunfeld
for(t in 1940:1950) {

  myregression[[t-1939]] <- lm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5))

  # Rename list elements by year range of subset
  names(myregression)[[t-1939]] = paste0("Years:",t-5,"-",t)
}

Here are the first two regressions stored in myregression

> myregression
$`Years:1935-1940`

Call:
lm(formula = inv ~ value + capital, data = Grunfeld_sub)

Coefficients:
(Intercept)        value      capital  
   -3.65240      0.08283      0.11033  


$`Years:1936-1941`

Call:
lm(formula = inv ~ value + capital, data = Grunfeld_sub)

Coefficients:
(Intercept)        value      capital  
  -13.77258      0.08614      0.18680  

For more detailed output do lapply(myregression, summary)

To run the plm function, couldn't you just use the Grunfeld data directly and supply the appropriate index argument to plm? For example:

for(t in 1940:1950) {

  myregression[[t-1939]] <- plm(inv ~ value + capital, 
                                data=subset(Grunfeld, year<=t & year>=t-5),
                                index=c("firm","year"))
  names(myregression)[[t-1939]] = paste0("Years:",t-5,"-",t)
}
Sign up to request clarification or add additional context in comments.

6 Comments

thank you very much for your help! It works like a charm now. What I am interested in also is the suggested procedure to aggregate the various regression outputs. Can you just take the averages of the coefficients or is it better to perform only one regression over all the data (all t) to give an quick overview about the coefficients and the model (like t-statistics and so on)?
Thank you again for your help and your answer regarding the plm issue. I believe that the pooling model of the plm function results in the same as applying lm directly. So my only issue would be the aggregation of the output. (presumably it makes sense.)
Can the question be moved there by a mod? Or should i post a new one? :)
Just post a new one and link to this.
Just to clarify: If your question is about statistical methodology (whether to average the subset coefficients, or create a single model for all the data, etc.) then you should post a question to stats.stackexchange.com. If you just want to extract the coefficients and average them, then t(sapply(myregression, coef) will give you the coefficients for each subset model.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.