2

I currently have the following code, which does a polynomial regression on a dataset with 4 variables:

def polyreg():
    dataset = genfromtxt(open('train.csv','r'), delimiter=',', dtype='f8')[1:]   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('test.csv','r'), delimiter=',', dtype='f8')[1:]

    poly = PolynomialFeatures(degree=2)
    train_poly = poly.fit_transform(train)
    test_poly = poly.fit_transform(test)

    clf = linear_model.LinearRegression()
    clf.fit(train_poly, target)

    savetxt('polyreg_test1.csv', clf.predict(test_poly), delimiter=',', fmt='%f')

I wanted to know if there was a way to output a summary of the regression like in Excel ? I explored the attributes/methods of linear_model.LinearRegression() but couldn't find anything.

enter image description here

1 Answer 1

4

This is not implemented in scikit-learn; the scikit-learn ecosystem is quite biased towards using cross-validation for model evaluation (this a good thing in my opinion; most of the test statistics were developed out necessity before computers were powerful enough for cross-validation to be feasible).

For more traditional types of statistical analysis you can use statsmodels, here is an example taken from their documentation:

import numpy as np
import statsmodels.api as sm

nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)

X = sm.add_constant(X)
y = np.dot(X, beta) + e

model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.020e+06
Date:                Sun, 01 Feb 2015   Prob (F-statistic):          2.83e-239
Time:                        09:32:32   Log-Likelihood:                -146.51
No. Observations:                 100   AIC:                             299.0
Df Residuals:                      97   BIC:                             306.8
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          1.3423      0.313      4.292      0.000         0.722     1.963
x1            -0.0402      0.145     -0.278      0.781        -0.327     0.247
x2            10.0103      0.014    715.745      0.000         9.982    10.038
==============================================================================
Omnibus:                        2.042   Durbin-Watson:                   2.274
Prob(Omnibus):                  0.360   Jarque-Bera (JB):                1.875
Skew:                           0.234   Prob(JB):                        0.392
Kurtosis:                       2.519   Cond. No.                         144.
==============================================================================
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.