0

Given this dataset

df=pd.DataFrame({'year':[2000,2000,2000,2000,2000,2001,2001,2001,2001,2001,2002,2002,2002,2002,2002],'metric':[2,3,4,5,6,12,13,14,15,16,22,23,24,25,26]})

running quantile regression for the 0.5 quantile using the Statsmodels package

model=smf.quantreg('metric ~ year', df)
result=model.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

results in this outcome:

enter image description here

SAS' PROC QUANTREG, however, produces enter image description here

I'm confused by the Statsmodels' coefficient. Given that the medians for the 3 years are 4, 14, and 24 shouldn't the coefficient be 10 like SAS'? Changing the kernel and bandwidth options doesn't affect it.

I do see the "condition number is large" message for Statsmodels. If I were to address this by normalizing the dataset such that the years are 0, 0.5, and 1 then I do get a coefficient of 20 which is what SAS produces as well.

Why does statsmodels producs a different coefficient for non-normalized data?

2
  • 2
    high condition number lowers precision and creates problems in the computations. Most likely this causes convergence problems in the iteration loop. (It looks like statsmodels quantile regression is not very robust if model/data is not well behaved.) Commented Dec 8, 2023 at 20:08
  • 2
    Statsmodels does not do any internal transformation like scaling of the data to improve optimization. It's up to the user to provide well behaved data. Whether this will change in statsmodels remains undecided. github.com/statsmodels/statsmodels/issues/1131 Commented Dec 8, 2023 at 20:16

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.