I'm working with a two-way fixed effects (TWFE) logit model in R using the fixest package. My model includes interaction terms between some fixed effects and a categorical variable called group. To report the coefficients of each fixed effect term, I've included one of the fixed effects as a regular control variable. However, I've noticed that this approach leads to differences in standard errors and p-values compared to the standard TWFE model.
When I include one of the fixed effects (e.g., wave) as a regular control variable instead of as a fixed effect, the standard errors and p-values differ, although the coefficients remain the same. Below is a reproducible example:
library(fixest)
library(dplyr)
# Simulate data
set.seed(12345)
n <- 1000
data <- data.frame(
outcome = rbinom(n, 1, 0.5),
group = factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE)),
event1 = rbinom(n, 1, 0.3),
event2 = rbinom(n, 1, 0.2),
event3 = rbinom(n, 1, 0.4),
event4 = rbinom(n, 1, 0.25),
age_group = factor(sample(2:4, n, replace = TRUE)),
gender = factor(sample(c("Male", "Female"), n, replace = TRUE)),
education = factor(sample(c("Low", "Medium", "High"), n, replace = TRUE)),
ethnicity = factor(sample(c("Ethnic1", "Ethnic2"), n, replace = TRUE)),
income_level = factor(sample(1:4, n, replace = TRUE)),
perception = factor(sample(1:3, n, replace = TRUE)),
previous_vote = rbinom(n, 1, 0.4),
struggle = factor(sample(1:3, n, replace = TRUE)),
news_type = factor(sample(c("News1", "News2", "News3", "NoNews"), n, replace = TRUE)),
location_type = factor(sample(c("Urban", "Rural"), n, replace = TRUE)),
city = factor(sample(1:10, n, replace = TRUE)),
wave = factor(sample(paste0("Wave", 1:5), n, replace = TRUE)),
weight = runif(n, 0.5, 2)
)
# Model 1: Including Wave as a fixed effect
model1 <- feglm(
outcome ~ i(group, "Group1") +
i(group, event1, ref = "Group1", ref2 = "0") +
i(group, event2, ref = "Group1", ref2 = "0") +
i(group, event3, ref = "Group1", ref2 = "0") +
i(group, event4, ref = "Group1", ref2 = "0") +
i(age_group, ref = "2") + gender + education +
i(ethnicity, ref = "Ethnic1") + income_level +
i(perception, ref = "2") + as.factor(previous_vote) +
i(struggle, ref = "2") + as.factor(news_type) + location_type | city + wave,
data = data,
family = binomial("logit"),
weights = data$weight,
cluster = ~city + wave
)
summary(model1, vcov = "twoway", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
# Model 2: Including Wave as a regular control variable
model2 <- feglm(
outcome ~ i(group, "Group1") + i(wave, ref = "Wave5") +
i(group, event1, ref = "Group1", ref2 = "0") +
i(group, event2, ref = "Group1", ref2 = "0") +
i(group, event3, ref = "Group1", ref2 = "0") +
i(group, event4, ref = "Group1", ref2 = "0") +
i(age_group, ref = "2") + gender + education +
i(ethnicity, ref = "Ethnic1") + income_level +
i(perception, ref = "2") + as.factor(previous_vote) +
i(struggle, ref = "2") + as.factor(news_type) + location_type | city,
data = data,
family = binomial("logit"),
weights = data$weight,
cluster = ~city + wave
)
summary(model2, vcov = "twoway", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
Questions:
Is the difference in standard errors and p-values between the two models due to the adjustment of degrees of freedom?
Is this a statistical problem (e.g. incidental parameter problem) or purely a software issue? If just a software issue, how can I fix it?
library(fixest).