0

I have a panel data:

Panel Data set - gravity model international trade

My R code:

#importing dataset
df <- read_excel("DataSet_Final.xlsx", 
                            col_types = c("text", "numeric", "text", 
                                                "numeric", "numeric", "numeric", 
                                                "numeric", "numeric", "numeric", 
                                                "numeric", "text", "numeric", "numeric", 
                                                "numeric", "numeric", "numeric", 
                                                "numeric", "numeric", "numeric", 
                                                "numeric", "numeric", "numeric", 
                                                "numeric", "numeric", "numeric"))
glimpse(df)

# Transforming some variables into log, for better interpretation and normalization of the distribution
df$log_pop <- log(df$Population)
df$log_dist <- log(df$Distance)
df$log_GDP <- log(df$GDP)
df$log_Trade <- log(df$Trade)


# Dropping unnecessary variable
df$Helper <- NULL
head(df)

df %>%
  group_by(Year, CountryName) %>%
  mutate(group_id = cur_group_id())

panel_data <- pdata.frame(df, index = c("CountryName", "Counterpart_Country_Name", "Year"))

Error:

Warning message:
In pdata.frame(df, index = c("CountryName", "Counterpart_Country_Name",  :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
    dput(read.table("CountryName    CountryCode Counterpart_Country_Name    TradePartnerCode    Year    Export
Belgium 124 Austria 122 1997    1.82394E+14
Belgium 124 Austria 122 1998    2.01838E+14
Belgium 124 Austria 122 1999    1968240347.9
Belgium 124 Austria 122 2000    1931467793
Belgium 124 Austria 122 2001    2067659120
Belgium 124 Austria 122 2002    2260078352
Belgium 124 Austria 122 2003    2684795303", sep="\t", header=TRUE))

What I am trying to achieve is to be able to do a fixed effects regression however I have encountered errors which I provided above. How can I deal with this error? The thing is I cannot really drop observations as each country trades with another country for multiple years.

I have tried to find answers on StackOverflow with no solution that have helped me out with this problem.

4
  • Hi 0klahoma, Welcome to Stack Overflow. It would be helpful you would please share a reproducible example including a small sample of your data set. You will likely get more helpful answers. You can do this best by using You can do this best by using dput(head(df)). Edit your question and include the output of the structure(...). Commented Jul 12, 2023 at 15:56
  • Thank you David, I have included a screenshot of the dataset. I am new to StackOverflow, if there is a better way to provide the excel sheet I would happily revise my post and include it the correct way. Commented Jul 13, 2023 at 21:20
  • Please see here: meta.stackoverflow.com/questions/285551/…. Commented Jul 14, 2023 at 8:17
  • And as David says you can easily provide sample data using the dput function. Commented Jul 14, 2023 at 8:17

3 Answers 3

0

Thank you @0klahoma for providing more information. I haven't used this package or done the type of modeling that you're doing, but here is a potential solution.

The Warning seems to be coming from the fact that pdata.frame may not be handling an input of three indices efficiently. I can reproduce your error like this:

> df <- data.frame(country=rep(c("A","B"), each=4),
                   partner=rep(c("C","D"), times=4),
                   year=rep(91:92, each=2))
> df
  country partner year
1       A       C   91
2       A       D   91
3       A       C   92
4       A       D   92
5       B       C   91
6       B       D   91
7       B       C   92
8       B       D   92

> pdf<-pdata.frame(df, index = c("country", "partner", "year"))
Warning message:
In pdata.frame(df, index = c("country", "partner", "year")) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")

It seems to be having a problem when there are non-unique combinations of the country and partner levels. In other words, it shows a Warning because of levels like A-C and B-D being repeated more than once, irrespective of year.

Here is something to try: concatenate your country and partner into one string such that you are only passing two values to pdata.frame. Namely, make one long string of the combination and one time period. I don't get the Warning with this.

> df$country.partner <- paste(df$country, df$partner, sep=".")
> df
  country partner year country.partner
1       A       C   91             A.C
2       A       D   91             A.D
3       A       C   92             A.C
4       A       D   92             A.D
5       B       C   91             B.C
6       B       D   91             B.D
7       B       C   92             B.C
8       B       D   92             B.D

> pdf<-pdata.frame(df, index = c("country.partner", "year"))
[No Warning]
> pdf
       country partner year country.partner
A.C-91       A       C   91             A.C
A.C-92       A       C   92             A.C
A.D-91       A       D   91             A.D
A.D-92       A       D   92             A.D
B.C-91       B       C   91             B.C
B.C-92       B       C   92             B.C
B.D-91       B       D   91             B.D
B.D-92       B       D   92             B.D

It is a bit non-intuitive, but I think that the expected input is to have a unique index crossed once with a time variable (two arguments, level+time, instead of three arguments, level1+level2+time).

See if this works for you.

Sign up to request clarification or add additional context in comments.

Comments

0

You could try out the fixest package for your problem. Here an example with 4 FE variables:

library(fixest)
data(trade)
gravity = feols(Euros ~ log(dist_km) | Origin + Destination + Product + Year, trade)
print(gravity)

Output:

OLS estimation, Dep. Var.: Euros
Observations: 38,325 
Fixed-effects: Origin: 15,  Destination: 15,  Product: 20,  Year: 10
Standard-errors: Clustered (Origin) 
              Estimate Std. Error  t value   Pr(>|t|)    
log(dist_km) -66754618   14306507 -4.66603 0.00036385 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 116,958,376.0     Adj. R2: 0.303863
                      Within R2: 0.055781

Comments

0

With the index argument of plm's pdata.frame (or in function plm directly), you do not specify the fixed effects. You specify the panel structure of the data, i.e., which variable serves as observational unit ("individual") and which variable specifis the time slices ("time"), so two dimensions.

Your observational unit seems to be the county (specified by either CountryName or CountryCode and your time dimension by Year, so you would simply do:

panel_data <- pdata.frame(df, index = c("CountryName", "Year"))

2 Comments

I have tried this before actually, but it gave me the error regarding duplicated id's which arent really duplicates since there are observations with multiple countries over different years. Thank you for your input!
Not sure if this is now solved... your data does not look like there are more than one observation per country for any given year, but the data's screenshot is hard to read.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.