1

I am trying to plot the decision boundary for boundary classification in logistic regression, but I dont quite understand how it should be done.

Here is a data set, which I have generated on which I apply logistical regression with numpy

import numpy as np
import matplotlib.pyplot as plt


# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000

# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000

# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)

X = np.concatenate((r0,r1))

enter image description here

After applyig logistic regression I found that the best thetas are:

thetas = [1.2182441664666837, 1.3233825647558795, -0.6480886684022018]

I tried to plot the decision bounary the following way:

yy = -(thetas[0] + thetas[1]*X)/thetas[1][2]
plt.plot(X,yy)

However, the graph that comes out has opposite slop than what expected: enter image description here

Thanks in advance

1 Answer 1

1

I think you maid 2 errors :

  • yy = -(thetas[0] + thetas[1]*X)/thetas[1][2]

    Why thetas[1][2] instead of theta[2] ?

  • and why transform X which your complete dataset ?

you can apply the transformation only to the min x and max x :

minx = np.min(X[:, 0])
maxx = np.max(X[:, 1])

## compute transformation : 

y1 = -(thetas[0] + thetas[1]*minx) / thetas[2]
y2 = -(thetas[0] + thetas[1]*maxx) / thetas[2]

## then plot the line [(minx, y1), (maxx, y2)]

plt.plot([minx, maxx], [y1, y2], c='black')

Complete working code with sklearn LogisticRegression:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Youre job :
# =============

# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000

# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000

# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)

X = np.concatenate((r0,r1))

## Added lines :

Y = np.concatenate((np.zeros(m0), np.ones(m1)))

model = LogisticRegression().fit(X,Y)

coefs =list(model.intercept_)
coefs.extend(model.coef_[0].tolist())
xmin = np.min(X[:, 0])
xmax = np.max(X[:, 0])


def bound(x):
    return -(coefs[0] + coefs[1] * x) / coefs[2]

p1 = np.array([xmin, bound(xmin)])
p2 = np.array([xmax, bound(xmax)])

plt.plot(r0[:, 0], r0[:, 1], ls='', marker='.', c='red')
plt.plot(r1[:, 0], r1[:, 1], ls ='', marker='.', c='blue')
plt.plot([p1[0], p1[1]], [p2[0], p2[1]], c='black')
plt.show()

Update

final plot link

Sign up to request clarification or add additional context in comments.

2 Comments

hey thanks for your answer! the slope of the line tho is same as the one I posted above, shouldnt the slope be negative?
hey, the slope is negative because if you consider beta0 as your intercept, beta1 and beta2 are both negatives so when you do the transformation : -(coefs[0] + coefs[1] * x) / coefs[2] the slope becomes - (coefs[1]/coefs[2]) which is negative.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.