wrong plot in logistic regression

Question

I am trying to implement logistic regression but I am receiving wrong plot.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
sns.set()

x = (np.random.randint(2000, size=400)).reshape((400,1))
y = (np.random.randint(2, size=400)).reshape((400,1)).ravel()

x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.4, random_state=0)

logistic_regr = LogisticRegression()
logistic_regr.fit(x_train, y_train)

fig, ax = plt.subplots()

ax.set(xlabel='x', ylabel='y')
ax.plot(x_test, logistic_regr.predict_proba(x_test), label='Logistic regr')
#ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')
ax.legend()

And I am receiving the following plot:

If I use:

ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')

I am receiving:

Your regression predicts always 0, that's why you are having this plot. Your training data is completely random and your target is only made of 0and 1 and you want it to be a linear regression. So the regression is a line and it predicts either always 0 or always 1. — MMF
– MMF, Commented Dec 9, 2016 at 17:16
@MMF:Hmm.Right!My target must lie between [0,1] since it is the probability.If I try as target np.linspace(0,1,400).ravel() it throws Unknown label type — George
– George, Commented Dec 9, 2016 at 17:18
But the problem is that you only have either 0 or 1. Not values in between. np.random.randint( ) returns only integers — MMF
– MMF, Commented Dec 9, 2016 at 17:20
@MMF:Using the logistic_regr.predict_proba should't it find a probability between [0,1] ? Regardless of my target? — George
– George, Commented Dec 9, 2016 at 17:22

nullop · Accepted Answer · 2016-12-12 19:35:49Z

1

+50

Well, you will not get a graph of sigmoid function with your particular choice of data. Your random input makes algorithm to find some separation between classes that will predict probabilities close to 0.5 with variations depending on the randomness of your input. You could get a sigmoid by using an evenly split range of values, one half of which belongs to the first class and the second half belongs to the second class. This way your predict_proba() function will output a range of probabilities for the particular class varied from 0 to 1 (I assume that the rest of your code will remain intact):

x = np.linspace(-2, 2, 400).reshape((400,1))
y = np.vstack((np.zeros(200), np.ones(200))).reshape((400,1))

then generate your graph:

ax.plot(x_test, logistic_regr.predict_proba(x_test)[:,1], '.', label='Logistic regr')

You will get a sigmoid-shaped plot describing the probability of predicting one of the classes:

answered Dec 12, 2016 at 19:35

nullop

5542 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

George Over a year ago

Ok,thanks!It seems also that without the '.' in the plot command the plot is messy!(also if you can help with this stackoverflow.com/questions/41043348/…. Thanks!

nullop Over a year ago

If you sort your x_test array by calling x_test.sort(axis=0) before passing it to predict_proba() function, you will get a smooth plot.

George Over a year ago

Hmm,ok it works!So,it is a good practice to sort things before plotting I guess

George Over a year ago

One question though..If we plot the logistic_regr.predict_proba(x_test) and not the [:,1] we take 2 sigmoids.Is this how the logistic regression generates the data?Thanks!(upvoted)

nullop Over a year ago

Logistic regression uses sigmoid to map our output into the range of values which are convenient for us to interpret as probability estimates. Sigmoid is just a tool, it is not the purpose of logistic regression. We may not always get a proper sigmoid-shaped output. My example is a very specific case where the probability of predicting one particular class varies from 0 to 1. This allows us to obtain a clear sigmoid shape, but real examples may have a very different graph of predicted values.

Collectives™ on Stack Overflow

wrong plot in logistic regression

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related