Syntax: Loop regressions with different subsets

Question

I 'm trying to perform a pooled regression by using different subsets with the same time interval (5years) but within different years. I'm having troubles with the syntax of my code, i seem to do something wrong with the definition of the subset.

> head(Grunfeld)
  firm year   inv  value capital
1    1 1935 317.6 3078.5     2.8
2    1 1936 391.8 4661.7    52.6
3    1 1937 410.6 5387.1   156.9
4    1 1938 257.7 2792.2   209.2
5    1 1939 330.8 4313.2   203.4
6    1 1940 461.2 4643.9   207.2




library(plm)
data("Grunfeld", package="plm")

#regression
myregression <- list()
Grunfeld_sub <- data.frame()
count <- 1

#loop
for(t in 1940:1950){

Grunfeld_sub[t] <- subset(Grunfeld, year<=t & year>=t-5)
myregression[[count]] <- lm(inv~value + capital, Grunfeld_sub(t))

count<- count+1
}

what am i doing wrong with the syntax? how do I define the subsample correctly?

another problem is that if i want to use the plm package and convert my data.frame (Grunfeld) with the plm.data function, i wont be able to use subset anymore as i somewhat can not use it with factorvariables (the time variable would become a factor variable) is there a possible solution regarding this matter? Thank you for your help.

One thing I noticed: Grunfeld_sub(t) should rather be Grunfeld_sub[t] — talat
– talat, Commented Jul 13, 2014 at 14:12

eipi10 · Accepted Answer · 2014-07-13 16:18:34Z

1

Your code is trying to store an entire subset of Grunfeld in one column of Grunfeld_sub, which is causing the error. You don't actually need to store subsets from previous loops, because you only use the current version of Grunfeld_sub in the current iteration of the loop. You also don't need a separate count variable. Here's a reworking of your code:

# Store each subset regression in myregression
myregression <- list()

# Regression on six-year subsets of Grunfeld
for(t in 1940:1950) {

  myregression[[t-1939]] <- lm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5))

  # Rename list elements by year range of subset
  names(myregression)[[t-1939]] = paste0("Years:",t-5,"-",t)
}

Here are the first two regressions stored in myregression

> myregression
$`Years:1935-1940`

Call:
lm(formula = inv ~ value + capital, data = Grunfeld_sub)

Coefficients:
(Intercept)        value      capital  
   -3.65240      0.08283      0.11033  


$`Years:1936-1941`

Call:
lm(formula = inv ~ value + capital, data = Grunfeld_sub)

Coefficients:
(Intercept)        value      capital  
  -13.77258      0.08614      0.18680

For more detailed output do lapply(myregression, summary)

To run the plm function, couldn't you just use the Grunfeld data directly and supply the appropriate index argument to plm? For example:

for(t in 1940:1950) {

  myregression[[t-1939]] <- plm(inv ~ value + capital, 
                                data=subset(Grunfeld, year<=t & year>=t-5),
                                index=c("firm","year"))
  names(myregression)[[t-1939]] = paste0("Years:",t-5,"-",t)
}

edited Jul 13, 2014 at 16:18

answered Jul 13, 2014 at 15:36

eipi10

94.5k28 gold badges220 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Gritti Over a year ago

thank you very much for your help! It works like a charm now. What I am interested in also is the suggested procedure to aggregate the various regression outputs. Can you just take the averages of the coefficients or is it better to perform only one regression over all the data (all t) to give an quick overview about the coefficients and the model (like t-statistics and so on)?

Gritti Over a year ago

Thank you again for your help and your answer regarding the plm issue. I believe that the pooling model of the plm function results in the same as applying lm directly. So my only issue would be the aggregation of the output. (presumably it makes sense.)

Gritti Over a year ago

Can the question be moved there by a mod? Or should i post a new one? :)

eipi10 Over a year ago

Just post a new one and link to this.

eipi10 Over a year ago

Just to clarify: If your question is about statistical methodology (whether to average the subset coefficients, or create a single model for all the data, etc.) then you should post a question to stats.stackexchange.com. If you just want to extract the coefficients and average them, then t(sapply(myregression, coef) will give you the coefficients for each subset model.

|

Collectives™ on Stack Overflow

Syntax: Loop regressions with different subsets

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related