3

I am quite new to Python. I would like to get a summary of a logistic regression like in R. I have created variables x_train and y_train and I am trying to get a logistic regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

clf = linear_model.LogisticRegression(C=1e5)
clf.fit(x_train, y_train)

What I get is:

LogisticRegression(C=100000.0, class_weight=None, dual=False,
    fit_intercept=True, intercept_scaling=1, max_iter=100,
    multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
    solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

I would like to have a summary with significative levels, R2 ecc.

4 Answers 4

7

I'd recommend taking a look at the statsmodels library. Sk-learn is great (and the other answers provide ways to get at R2 and other metrics), but statsmodels provides a regression summary very similar to the one you're probably used to in R.

As an example:

import statsmodels.api as sm
from sklearn.datasets import make_blobs

x, y = make_blobs(n_samples=50, n_features=2, cluster_std=5.0,
                  centers=[(0,0), (2,2)], shuffle=False, random_state=12)

logit_model = sm.Logit(y, sm.add_constant(x)).fit()
print logit_model.summary()

Optimization terminated successfully.
         Current function value: 0.620237
         Iterations 5
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                          Logit   Df Residuals:                       47
Method:                           MLE   Df Model:                            2
Date:                Wed, 28 Dec 2016   Pseudo R-squ.:                  0.1052
Time:                        12:58:10   Log-Likelihood:                -31.012
converged:                       True   LL-Null:                       -34.657
                                        LLR p-value:                   0.02611
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.0813      0.308     -0.264      0.792        -0.684     0.522
x1             0.1230      0.065      1.888      0.059        -0.005     0.251
x2             0.1104      0.060      1.827      0.068        -0.008     0.229
==============================================================================

If you want to add regularization, instead of calling .fit() after the Logit initialization you can call .fit_regularized() and pass in an alpha parameter (regularization strength). If you do this, remember that the C paramater in sk-learn is actually the inverse of regularization strength.

Sign up to request clarification or add additional context in comments.

1 Comment

@claudio you should accept this answer
1
  1. For obtaining of singificance levels you can use sklearn.feature_selection.f_regression.

  2. For obtaining R2 you can use sklearn.metrics.r2_score

Comments

1
import statsmodels.api as sm      
x_train1 = sm.add_constant(x_train1)
lm_1 = sm.OLS(y_train, x_train1).fit()
lm_1.summary()

This is a very use full package for the once who are very much used to R's model summary

For more info refer below articles:

  1. statsmodels.api
  2. stats-models-vs-sklearn

Comments

0

You can call clf.score(test_samples, true_values) to get R2.

Significance is not directly provided by sklearn but have at the answer here and this code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.