1

I see the following as a programming exercise, rather than a statistically grounded way of doing things.

Basically, I'd like to run N logistic regressions with one predictor variable and then for each variable store the variable name with its chi-squared value. After all predictions are done, I want to display each predictor variable ordered by chi-squared from highest to lowest.

So far I have the following:

local depvar    binvar1
local indepvars predvar1 predvar2 predvar3

* expand and check collinearity *
_rmdcoll `depvar' `indepvars', expand
local indepvars "`r(varlist)'"

* first order individual variables by best chi-squared *
local vars
local chis
foreach v in `indepvars' {
    di "RUN: logistic `depvar' `v'"
    quietly logistic `depvar' `v'

    * check if variable is not omitted (constant and iv) *
    if `e(rank)' < 2 {
        di "OMITTED (rank < 2): `v'"
        continue
    }

    * check if chi-squared is > 0 *
    if `e(chi2)' <= 0 {
        di "OMITTED (chi2 <= 0): `v'"
        continue
    }

    * store *
    local vars "`vars' `v'"
    local chis "`chis' `e(chi2)'"
    di "ADDED: `v' (chi2: `e(chi2)')"
}

* ... now sort each variable (from varlist vars) by chi2 (from varlist chis) ... * 

How would I sort each variable by the returned chi-square in the last line and then display the list of variables with their chi-squared ordered from highest chi-squared to lowest chi-squared?

To be clear, if the following varlists resulted from the above:

local vars predvar1 predvar2 predvar3
local chis 2 3 1

Then I would like to get something like the following:

local ordered predvar2 3 predvar1 2 predvar3 1

Or, alternatively,

local varso predvar2 predvar1 predvar3
local chiso 3 2 1

1 Answer 1

2

Here is one way to do it.

local depvar    binvar1
local indepvars predvar1 predvar2 predvar3

* expand and check collinearity *
_rmdcoll `depvar' `indepvars', expand
local indepvars "`r(varlist)'"

* first order individual variables by best chi-squared *

gen chisq = . 
gen vars = "" 
local i = 1 

foreach v in `indepvars' {
     di "RUN: logistic `depvar' `v'"
     quietly logistic `depvar' `v'

     * check if variable is not omitted (constant and iv) *
     if `e(rank)' < 2 {
          di "OMITTED (rank < 2): `v'"
     }

     * check if chi-squared is > 0 *
     else if `e(chi2)' <= 0 {
          di "OMITTED (chi2 <= 0): `v'"
     }

     * store *
     else {  
          quietly replace vars  = "`v'" in `i' 
          quietly replace chisq = -e(chi2) in `i' 
          local ++i   
          di "ADDED: `v' (chi2: `e(chi2)')"
     }
}

sort chisq
replace chisq = -chisq 
l vars chisq if chisq < ., noobs 
Sign up to request clarification or add additional context in comments.

3 Comments

Very smart. As a aside, do you think it might be worth using tempfile here instead of working in the main dataset (e.g. if it is very large or something)?
There is an assumption in my method that the number of variables being used is no greater than the number of observations. If that were wrong there would be some point in using an external file, but presumably the modelling wouldn't then work. Otherwise, I can't see any point in using another file. There is some awkwardness in having variables not aligned with others. There could be some attraction in using gsort to avoid the awkwardness of negating chi-square results and negating them back after sort. (sort puts lowest first.)
you can add the logit option to the _rmcoll command to take care of perfect separation in addition to multicolinearity.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.