2

I wrote the following piece of code but I just cannot get the 'predict' method to work:

import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()

My 'df' is a pandas dataframe with column headings 'Consumption' and 'Disposable_Income'. When I run, for example,

ols_model.predict([1000.0])

I get: "TypeError: list indices must be integers, not str"

When I run, for example,

ols_model.predict(df['Disposable_Income'].values)

I get: "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices"

I'm very confused because I thought these two formats are precisely what the documentation says - put in an array of values for the x variable. How exactly am I supposed to use the 'predict' method?

This is how my df look: enter image description here

4
  • Could you post df.head()? Commented Nov 2, 2015 at 11:50
  • @WoodChopper ok, see edited post Commented Nov 2, 2015 at 11:54
  • ~ is different operator in pandas from R. Commented Nov 2, 2015 at 12:03
  • Please provide full tracebacks or at least the last few lines, so we see where the exception has been raised. The first exception might be a bug in patsy for different kinds of integers that has already been fixed, but I don't know if it released yet. Commented Nov 2, 2015 at 14:13

2 Answers 2

2

Since you work with the formulas in the model, the formula information will also be used in the interpretation of the exog in predict.

I think you need to use a dataframe or a dictionary with the correct name of the explanatory variable(s).

ols_model.predict({'Disposable_Income':[1000.0]})

or something like

df_predict = pd.DataFrame([[1000.0]], columns=['Disposable_Income'])
ols_model.predict(df_predict)

Another option is to avoid formula handling in predict if the full design matrix for prediction, including constant, is available

AFAIR, this should also work:

ols_model.predict([[1, 1000.0]], transform=False)

Sign up to request clarification or add additional context in comments.

1 Comment

You are right - it works! I totally didn't know that I need to use a dictionary syntax. Thanks for teaching me something new
0

Not sure if this is the best approach, but after lots and lots of fiddling around, I got this code to work (seems abit clumsy and inefficient):

Say I want to predict the value at X=10 and X=1000:

import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()
regressor = ols('Consumption ~ Disposable_Income', df)
regressor.predict(ols_model.params, exog=[[1,10],[1,1000]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.