How do I store regression results from loops in Stata?

Question

I have built a model which basically does the following:

run regressions on single time period
organise stocks into quantiles based on coefficient from linear regression
statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
store quantile 1 portolio and quantile 10 return for the last period

The pair of variables are just the final entries in the timeframe. However, I intend to extend the single time period to rolling through a large timeframe, in essence:

for i in timeperiod {
    organise stocks into quantiles based on coefficient from linear regression
    statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
    store quantile 1 portolio and quantile 10 return for the last period 
}

The data I'm after is the portfolio 1 and 10 returns for the final day of each timeframe (built using the previous 3 years of data). This should result in a time series (of my total data 60 -3 years to build first result, so 57 years) of returns which I can then regress against eachother.

regress portfolio 1 against portfolio 10

I am coming from an R background, where storing a variable in a vector is very simple, but I'm not quite sure how to go about this in Stata.

In the end I want a 2xn matrix (a separate dataset) of numbers, each pair being results of one run of a rolling regression. Sorry for the very vague description, but it's better than explaining what my model is about. Any pointers (even if it's to the right manual entry) will be much appreciated. Thank you.

EDIT: The actual data I want to store is just a variable. I made it confusing by adding regressions. I've changed the code to more represent what I want.

Give us more specific code, please. Start with webuse nlswork, clear; give the kind of regression for each year that you are interested in, or at least fake something with regress; and point out what it is that you want to store for each regression. — StasK
– StasK, Commented Aug 19, 2014 at 13:23
I have expanded the response, hopefully now should be a bit more clear. The actual code is very long, but this should give a good overview. I should have made it clear it's not regression coefficients I want to store, but simply variables resulting from the regressions. — Mach
– Mach, Commented Aug 20, 2014 at 3:38
In Stata terms, this won't be a 2 x n matrix. You rather want a data set. (Like in R, you probably won't say you wanted a matrix, but rather wanted a data frame.) — StasK
– StasK, Commented Aug 21, 2014 at 13:27

Roberto Ferrer · Accepted Answer · 2014-08-19 19:05:19Z

3

Sounds like a case for either rolling or statsby, depending on what you exactly want to do. These are prefix commands, that you prefix to your regression model. rolling or statsby will take care of both the looping and storing of results for you.

If you want maximum control, you can do the loop yourself with forvalues or foreach and store the results in a separate file using post. In fact, if you look inside rolling and statsby (using viewsource) you will see that this is what these commands do internally.

edited Aug 19, 2014 at 19:05

Roberto Ferrer

11.1k1 gold badge24 silver badges24 bronze badges

answered Aug 19, 2014 at 7:43

Maarten Buis

2,70414 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mach Over a year ago

Thanks for that. I updated the code a bit to show a bit more detail. Rolling looks ideal, I'll go dig through the manuals to see if it's applicable.

StasK · Accepted Answer · 2014-08-21 14:35:10Z

Unlike R, Stata operates with only one major rectangular object in memory, called (ta-da!) the data set. (It has a multitude of other stuff, of course, but that stuff can rarely be addressed as easily as the data set that was brought into memory with use). Since your ultimate goal is to run a regression, you will either need to create an additional data set, or awkwardly add the data to the existing data set. Given that your problem is sufficiently custom, you seem to need a custom solution.

Solution 1: create a separate data set using post (see help).

use my_data, clear
postfile topost int(time_period) str40(portfolio) double(return_q1 return_q10) ///
     using my_derived_data, replace
* 1. topost is a placeholder name
* 2. I have no clue what you mean by "storing the portfolio", so you'd have to fill in
* 3. This will create the file my_derived_data.dta, 
*    which of course you can name as you wish
* 4. The triple slash is a continuation comment: the code is coninued on next line

levelsof time_period, local( allyears )
* 5. This will create a local macro allyears 
*    that contains all the values of time_period

foreach t of local allyears {
   regress outcome x1 x2 x3 if time_period == `t', robust
   * 6. the opening and closing single quotes are references to Stata local macros
   *    Here, I am referring to the cycle index t

   organise_stocks_into_quantiles_based_on_coefficient_from_linear_regression
   * this isn't making huge sense for me, so you'll have to put your code here
   * don't forget inserting if time_period == `t' as needed
   * something like this:
   predict yhat`t' if time_period == `t', xb
   xtile decile`t' = yhat`t' if time_period == `t', n(10)

   calculate_portfolio_returns_for_stocks_based_on_quantile
   forvalues q=1/10 {
        * do whatever if time_period == `t' & decile`t' == `q'
   }

   * store quantile 1 portolio and quantile 10 return for the last period 
   * again I am not sure what you mean and how to do that exactly
   * so I'll pretend it is something like
   ratio change / price if time_period == `t' , over( decile`t' )
   post topost (`t') ("whatever text describes the time `t' portfolio") /// 
       (_b[_ratio_1:1]) (_b[_ratio_1:10])
   * the last two sets of parentheses may contain whatever numeric answer you are producing
}

postclose topost
* 7. close the file you are creating

use my_derived_data, clear
tsset time_period, year
newey return_q10 return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

Solution 2: keep things within your current data set. If the above code looks awkward, you can instead create a weird centaur of a data set with both your original stocks and the summaries in it.

use my_data, clear

gen int collapsed_time = .
gen double collapsed_return_q1 = .
gen double collapsed_return_q10 = .
* 1. set up placeholders for your results

levelsof time_period, local( allyears )
* 2. This will create a local macro allyears 
*    that contains all the values of time_period

local T : word count `allyears'
* 3. I now use the local macro allyears as is
*    and count how many distinct values there are of time_period variable

forvalues n=1/`T' {
   * 4. my cycle now only runs for the numbers from 1 to `T'

   local t : word `n' of `allyears'
   * 5. I pull the `n'-th value of time_period

   ** computations as in the previous solution

   replace collapsed_time_period = `t' in `n'
   replace collapsed_return_q1 = (compute) in `n'
   replace collapsed_return_q10 = (compute) in `n'
   * 6. I am filling the pre-arranged variables with the relevant values
}

tsset collapsed_time_period, year
* 7. this will likely complain about missing values, so you may have to fix it
newey collapsed_return_q10 collapsed_return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

I avoided statsby as it overwrites the data set in memory. Remember that unlike R, Stata can only remember one data set at a time, so my preference is to avoid excessive I/O operations as they may well be the slowest part of the whole thing if you have a data set of 50+ Mbytes.

If you use statsby with the saving() option it will not overwrite your current data set, but store the results in a separate file specified inside the saving() option.
That was an incredibly helpful answer. You also did quite a good job at seeing what I was trying to do from my obscure summary. I used your first example using the postfile, and it left me with exactly what I need. Thanks for taking the time to give a good example.
No problem, Mach. Maarten, does statsby use post mechanics to do that, or does it preserve, operate in memory, and then restore?

lennon310 · Accepted Answer · 2014-11-17 17:17:01Z

0

I think you're looking for the estout command to store the results of the regressions.

edited Nov 17, 2014 at 17:17

lennon310

12.7k11 gold badges46 silver badges63 bronze badges

answered Nov 17, 2014 at 17:13

Ganlin Jin

12 bronze badges

Collectives™ on Stack Overflow

How do I store regression results from loops in Stata?

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related