How to plot decision boundary for logistic regression in Python?

Question

I am trying to plot the decision boundary for boundary classification in logistic regression, but I dont quite understand how it should be done.

Here is a data set, which I have generated on which I apply logistical regression with numpy

import numpy as np
import matplotlib.pyplot as plt


# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000

# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000

# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)

X = np.concatenate((r0,r1))

After applyig logistic regression I found that the best thetas are:

thetas = [1.2182441664666837, 1.3233825647558795, -0.6480886684022018]

I tried to plot the decision bounary the following way:

yy = -(thetas[0] + thetas[1]*X)/thetas[1][2]
plt.plot(X,yy)

However, the graph that comes out has opposite slop than what expected:

Thanks in advance

jaouena · Accepted Answer · 2019-12-03 13:43:15Z

1

I think you maid 2 errors :

yy = -(thetas[0] + thetas[1]*X)/thetas[1][2]

Why thetas[1][2] instead of theta[2] ?
and why transform X which your complete dataset ?

you can apply the transformation only to the min x and max x :

minx = np.min(X[:, 0])
maxx = np.max(X[:, 1])

## compute transformation : 

y1 = -(thetas[0] + thetas[1]*minx) / thetas[2]
y2 = -(thetas[0] + thetas[1]*maxx) / thetas[2]

## then plot the line [(minx, y1), (maxx, y2)]

plt.plot([minx, maxx], [y1, y2], c='black')

Complete working code with sklearn LogisticRegression:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Youre job :
# =============

# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000

# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000

# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)

X = np.concatenate((r0,r1))

## Added lines :

Y = np.concatenate((np.zeros(m0), np.ones(m1)))

model = LogisticRegression().fit(X,Y)

coefs =list(model.intercept_)
coefs.extend(model.coef_[0].tolist())
xmin = np.min(X[:, 0])
xmax = np.max(X[:, 0])


def bound(x):
    return -(coefs[0] + coefs[1] * x) / coefs[2]

p1 = np.array([xmin, bound(xmin)])
p2 = np.array([xmax, bound(xmax)])

plt.plot(r0[:, 0], r0[:, 1], ls='', marker='.', c='red')
plt.plot(r1[:, 0], r1[:, 1], ls ='', marker='.', c='blue')
plt.plot([p1[0], p1[1]], [p2[0], p2[1]], c='black')
plt.show()

Update

final plot link

edited Dec 3, 2019 at 13:43

answered Dec 3, 2019 at 12:16

jaouena

986 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hispaniccoder Over a year ago

hey thanks for your answer! the slope of the line tho is same as the one I posted above, shouldnt the slope be negative?

jaouena Over a year ago

hey, the slope is negative because if you consider beta0 as your intercept, beta1 and beta2 are both negatives so when you do the transformation : -(coefs[0] + coefs[1] * x) / coefs[2] the slope becomes - (coefs[1]/coefs[2]) which is negative.

Collectives™ on Stack Overflow

How to plot decision boundary for logistic regression in Python?

1 Answer 1

Complete working code with sklearn LogisticRegression:

Update

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Complete working code with sklearn LogisticRegression:

Update

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related