1

I am trying to run a two-way fixed-effects panel regression using plm in R. First, I randomly generate some data. Then I create time and firm indices (two-way indexing as usual in a panel dataset) and the explanatory variable of interest (zp.dummy). Then I create a panel data frame. Then I try to fit a two-way fixed-effects panel regression via plm:

library(plm)
set.seed(0); z=rnorm(40)        # generate random data
ztime=rep(c(1:10),4)            # time index
zp.dummy=as.numeric(ztime>5)    # a dummy to distinguish first 5 from last 5 time periods
zfirm=rep(sequence(4), each=10) # firm index
zp.rete=pdata.frame(cbind(ztime,zfirm,zp.dummy,z),index=c("ztime","zfirm"))
                                # create panel data frame indexed by time and firm
colnames(zp.rete)[4]="zp.rete"  # rename a column in the panel data frame
zm1p=plm(zp.rete~zp.dummy, data=zp.rete, index=c("ztime","zfirm"), model="within", effect="twoways")               
                                # run the panel regression via `plm`

When running the last line, I get this error message:

> Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor,  : 
  empty model

Question: What am I doing wrong?

I think I can achieve the desired result via lm:

zftime=as.factor(ztime)         # turn time index into factor
zffirm=as.factor(zfirm)         # turn firm index into factor
zm1 = lm(zp.rete$zp.rete~-1+zp.dummy+zffirm+zftime) 
                                # two-way fixed effects regression via `lm`

How may I replicate the result from lm by plm?

0

1 Answer 1

2

Carefully look at the output of the model via lm: You will notice, a factor's level is non-estimable (is NA). That is because there is not enough information in the data.

# NA coefficient:
summary(zm1)
model.matrix(zm1) ## looks suspicious
plm::detect.lindep(model.matrix(zm1)) ## collinear columns

Now, why does plm output an error? It transforms the data first (two-way within transformation) and then runs a plain linear regression on the transformations result, for the right-hand side called the model matrix. We can also look at the model matrix (the data after transformation) and will notice, we end up with a zero-only column. Obviously, a model with one zero-only column is not estimable and, thus, plm errors rightfully.

library(plm)
set.seed(0); z <- rnorm(40)        # generate random data
ztime <- rep(c(1:10),4)            # time index
zp.dummy <- as.numeric(ztime>5)    # a dummy to distinguish first 5 from last 5 time periods
zfirm <- rep(sequence(4), each=10) # firm index
zp.data <- pdata.frame(cbind(ztime, zfirm, zp.dummy, z),index=c("zfirm", "ztime"))
# create panel data frame indexed by time and firm
colnames(zp.data)[4] <- "zp.rete"  # rename a column in the panel data frame
# create model frame
mf <- model.frame(zp.data, zp.rete ~ zp.dummy)
# create model matrix
mm <- model.matrix(mf, model = "within", effect = "twoways")
all(mm == 0) # TRUE
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for your answer! The problem with manually constructed time and firm fixed effects in lm is that there is multicollinearity. A solution is to exclude the base category of the time fixed effects and the base category of the firm fixed effects; I was too lazy to do that manually, knowing that lm would take care of this automatically. Now, I expect that plm is smart enough to exclude these base categories when constructing the fixed effects under the hood. Am I wrong? If not, what is causing the trouble?
It is in order of processing. lm chooses to set last variable's level to NA (could be any, is a convention). plm needs the factor variables (indexes) for the data transformation, hence the variable of interest, zp.dummy is affected. Then it is a matter of taste to output NA or let the model error. Look at this model's output and see the variable of interest being NA - is that helpful? lm(zp.rete ~ -1 + zffirm + zftime + zp.dummy, data = zp.data)
So does that mean that my explanatory variable is collinear with the two-way fixed effects dummies (even after having removed their base categories)? Trying to interpret this, I guess I can see the issue: the explanatory variable is a level shift for some time periods. The time fixed effects also measure the level of these periods. So then there are many possible combinations of these that give the exact same fit. Am I getting it right?
May aim is to test whether the level of the dependent variable has shifted in the second part of the sample (the latter 5 time periods). Any idea of how to do that? On paper, I could restrict the time fixed effects as follows: t_1+...+t_5=t_6+...+t_10 and then estimate the level shift (the coef. on the zp.dummy). What would be a sensible way of doing that with plm?
This is rather a statistical question. As zp.dummy and the time dimension contain similar information (in the sense of collinearity) (zp.dummy was constructed using by the time dimension) you will run into this issue with linear regression approaches. Why would you need the time effects for the question at hand? Wouldn't be the individual dimension be enough zm2p <- plm(zp.rete~zp.dummy, data=zp.data, model="within", effect="individual") (mind the order of the index attribute in pdata.frame!)?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.