Ordering a varlist in Stata code

Question

I see the following as a programming exercise, rather than a statistically grounded way of doing things.

Basically, I'd like to run N logistic regressions with one predictor variable and then for each variable store the variable name with its chi-squared value. After all predictions are done, I want to display each predictor variable ordered by chi-squared from highest to lowest.

So far I have the following:

local depvar    binvar1
local indepvars predvar1 predvar2 predvar3

* expand and check collinearity *
_rmdcoll `depvar' `indepvars', expand
local indepvars "`r(varlist)'"

* first order individual variables by best chi-squared *
local vars
local chis
foreach v in `indepvars' {
    di "RUN: logistic `depvar' `v'"
    quietly logistic `depvar' `v'

    * check if variable is not omitted (constant and iv) *
    if `e(rank)' < 2 {
        di "OMITTED (rank < 2): `v'"
        continue
    }

    * check if chi-squared is > 0 *
    if `e(chi2)' <= 0 {
        di "OMITTED (chi2 <= 0): `v'"
        continue
    }

    * store *
    local vars "`vars' `v'"
    local chis "`chis' `e(chi2)'"
    di "ADDED: `v' (chi2: `e(chi2)')"
}

* ... now sort each variable (from varlist vars) by chi2 (from varlist chis) ... *

How would I sort each variable by the returned chi-square in the last line and then display the list of variables with their chi-squared ordered from highest chi-squared to lowest chi-squared?

To be clear, if the following varlists resulted from the above:

local vars predvar1 predvar2 predvar3
local chis 2 3 1

Then I would like to get something like the following:

local ordered predvar2 3 predvar1 2 predvar3 1

Or, alternatively,

local varso predvar2 predvar1 predvar3
local chiso 3 2 1

Nick Cox · Accepted Answer · 2013-04-20 00:01:57Z

2

Here is one way to do it.

local depvar    binvar1
local indepvars predvar1 predvar2 predvar3

* expand and check collinearity *
_rmdcoll `depvar' `indepvars', expand
local indepvars "`r(varlist)'"

* first order individual variables by best chi-squared *

gen chisq = . 
gen vars = "" 
local i = 1 

foreach v in `indepvars' {
     di "RUN: logistic `depvar' `v'"
     quietly logistic `depvar' `v'

     * check if variable is not omitted (constant and iv) *
     if `e(rank)' < 2 {
          di "OMITTED (rank < 2): `v'"
     }

     * check if chi-squared is > 0 *
     else if `e(chi2)' <= 0 {
          di "OMITTED (chi2 <= 0): `v'"
     }

     * store *
     else {  
          quietly replace vars  = "`v'" in `i' 
          quietly replace chisq = -e(chi2) in `i' 
          local ++i   
          di "ADDED: `v' (chi2: `e(chi2)')"
     }
}

sort chisq
replace chisq = -chisq 
l vars chisq if chisq < ., noobs

answered Apr 20, 2013 at 0:01

Nick Cox

37.4k6 gold badges37 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tom Over a year ago

Very smart. As a aside, do you think it might be worth using tempfile here instead of working in the main dataset (e.g. if it is very large or something)?

Nick Cox Over a year ago

There is an assumption in my method that the number of variables being used is no greater than the number of observations. If that were wrong there would be some point in using an external file, but presumably the modelling wouldn't then work. Otherwise, I can't see any point in using another file. There is some awkwardness in having variables not aligned with others. There could be some attraction in using gsort to avoid the awkwardness of negating chi-square results and negating them back after sort. (sort puts lowest first.)

Maarten Buis Over a year ago

you can add the logit option to the _rmcoll command to take care of perfect separation in addition to multicolinearity.

Collectives™ on Stack Overflow

Ordering a varlist in Stata code

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related