3

Using Pandas OLS I am able to fit and use a model as follows:

ols_test = pd.ols(y=merged2[:-1].Units, x=merged2[:-1].lastqu) #to exclude current year, then do forecast method
yrahead=(ols_test.beta['x'] * merged2.lastqu[-1:]) + ols_test.beta['intercept']

I needed to switch to statsmodels to get some additional functionality (mainly the residual plots See(question here)

So now I have:

def fit_line2(x, y):
    X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the calculation of the intercept
    model = sm.OLS(y, X,missing='drop').fit()
    """Return slope, intercept of best fit line."""
    X = sm.add_constant(x)
    return model

And:

model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units)
print fit.summary()

But I cannot get

yrahead2=model.predict(merged2.lastqu[-1:]) 

or any variant to give me a prediction? Note that the pd.ols uses the same merged2.lastqu[-1:] to grab the data I want to 'predict" from, no matter what I put into the () for predict I'm not having any joy. It seems statsmodels wants something specific in the () other than a pandas DF cell I even tried to just put a number eg 2696 there but still nothing... My current error is

----> 3 yrahead2=model.predict(merged2.lastqu[-1:])

/usr/lib/pymodules/python2.7/statsmodels/base/model.pyc in predict(self, exog, transform, *args, **kwargs)
   1004             exog = np.atleast_2d(exog) # needed in count model shape[1]
   1005 
-> 1006         return self.model.predict(self.params, exog, *args, **kwargs)
   1007 
   1008 

/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.pyc in predict(self, params, exog)
    253         if exog is None:
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 
    257 class GLS(RegressionModel):

ValueError: objects are not aligned

> /usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py(255)predict()
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 

2 Answers 2

3

I prefer the formula api for statsmodels. At least for that, model.fit().predict wants a DataFrame where the columns have the same names as the predictors. Here's an example:

In [2]: df = pd.DataFrame({'X': np.arange(10), 'Y': np.arange(10) + np.random.randn(10)})

In [3]: mod = sm.OLS.from_formula("Y ~ X", df)

In [4]: res = mod.fit()

In [5]: exog = pd.DataFrame({"X": np.linspace(0, 10, 100)})

In [6]: res.predict(exog)
Out[6]: 
array([ 0.99817045,  1.07854804,  1.15892563,  1.23930322,  1.31968081,
        1.40005839,  1.48043598,  1.56081357,  1.64119116,  1.72156875,
        1.80194634,  1.88232393,  1.96270152,  2.04307911,  2.1234567 ,
        2.20383429,  2.28421188,  2.36458947,  2.44496706,  2.52534465,
        2.60572224,  2.68609983,  2.76647742,  2.84685501,  2.92723259,
        3.00761018,  3.08798777,  3.16836536,  3.24874295,  3.32912054,
        3.40949813,  3.48987572,  3.57025331,  3.6506309 ,  3.73100849,
        3.81138608,  3.89176367,  3.97214126,  4.05251885,  4.13289644,
        4.21327403,  4.29365162,  4.3740292 ,  4.45440679,  4.53478438,
        4.61516197,  4.69553956,  4.77591715,  4.85629474,  4.93667233,
        5.01704992,  5.09742751,  5.1778051 ,  5.25818269,  5.33856028,
        5.41893787,  5.49931546,  5.57969305,  5.66007064,  5.74044823,
        5.82082582,  5.9012034 ,  5.98158099,  6.06195858,  6.14233617,
        6.22271376,  6.30309135,  6.38346894,  6.46384653,  6.54422412,
        6.62460171,  6.7049793 ,  6.78535689,  6.86573448,  6.94611207,
        7.02648966,  7.10686725,  7.18724484,  7.26762243,  7.34800002,
        7.4283776 ,  7.50875519,  7.58913278,  7.66951037,  7.74988796,
        7.83026555,  7.91064314,  7.99102073,  8.07139832,  8.15177591,
        8.2321535 ,  8.31253109,  8.39290868,  8.47328627,  8.55366386,
        8.63404145,  8.71441904,  8.79479663,  8.87517421,  8.9555518 ])
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks but.. I am using sm api...what I'm having trouble with is model.predict(merged2.lastqu[-1:]) which is a DF that looks like date 2014-12-31 2651 Name: lastqu, dtype: float64 <<< I want to use the 2651 as the "exog"
Don't see how using formula helps vs straight sm? , and for my use case with already constructed DF's that will be in a function, not sure how to set up.. surely there is a way to get predict to accept the DF cell? I am only trying to predict one period ahead..
mmm is there someway to get the the model names and rename the predictor if that is the issue? Grasping here
2

your merged2.lastqu[-1:] doesn't contain the constant

yrahead2=model.predict(sm.add_constant(merged2.lastqu[-1:], prepend=True))

should work.

An alternative is to add the constant to the dataframe in the same way as to the X in the model, and use the appropriate columns of the dataframe df[['const', my_other_X]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.