1

I am new to machine learning with python. I've managed to draw the straight decision boundary for logistic regression using matplotlib. However, I am facing a bit of difficulty in plotting a curve line to understand the case of overfitting using some sample dataset.

I am trying to build a logistic regression model using regularization and use regularization to control overfitting my data set.

I am aware of the sklearn library, however I prefer writing code separately

The test data sample I am working on is given below:

x=np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650')
y=np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0') 

The decision boundary I am expecting is given in the graph below:

enter image description here
Any help would be appreciated.

I could plot a straight decision boundary using the code below:

# plot of x 2D
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)

plt.plot(X[pos[0],0], X[pos[0],1], 'ro')
plt.plot(X[neg[0],0], X[neg[0],1], 'bo')
plt.xlim([min(X[:,0]),max(X[:,0])])
plt.ylim([min(X[:,1]),max(X[:,1])])
plt.show()

# plot of the decision boundary
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)

plt.plot(x[pos[0],1], x[pos[0],2], 'ro')
plt.plot(x[neg[0],1], x[neg[0],2], 'bo')
plt.xlim([x[:, 1].min()-2 , x[:, 1].max()+2])
plt.ylim([x[:, 2].min()-2 , x[:, 2].max()+2])


plot_x = [min(x[:,1])-2,  max(x[:,1])+2]   # Takes a lerger decision line

plot_y = (-1/theta_NM[2])*(theta_NM[1]*plot_x +theta_NM[0])
plt.plot(plot_x, plot_y)

And my decision boundary looks like this:

enter image description here

In an ideal scenario the above decision boundary is good but I would like to plot a curve decision boundary that will fit my training data very well but will overfit my test data. something similar to shown in the 1st plot

6
  • what is your question/problem? what have you tried? Commented May 4, 2015 at 11:32
  • Hi julien, I am trying to build a curve decision boundary, I've tried plotting a straight line using matplotlib. But I have no idea how can I plot a curve line in matplotlib.. I am trying out polynomial features for a curve line. I've made some updates on my question. kindly have a look.. thanks :) Commented May 4, 2015 at 11:50
  • can't you simply use the average between the maximum envelope of the blue points and the minimum envelope of the red points? (although not sure what to do about missing red points (x < 3) and missing blue points (x > 5) in your original plot) Commented May 4, 2015 at 12:22
  • Actually they are not missing, They can be seen when the graph is elongated. I am looking for some contour plot. I can plot the same using octave, since I am new to python I have no clue how to proceed Commented May 4, 2015 at 12:31
  • Not sure what you mean by "they are not missing": unless x and y are not your complete set of data, there is no point for which y = 1 that has an abscissa lower than 3. Commented May 4, 2015 at 12:51

1 Answer 1

2

This can be done by gridding the parameter space and setting each grid point to the value of the closest point. Then running a contour plot on this grid.

But there are numerous variations, such as setting it to a value of a distance-weighted average; or smoothing the final contour; etc.

Here's an example for finding the initial contour:

enter image description here

import numpy as np
import matplotlib.pyplot as plt

# get the data as numpy arrays
xys = np.array(np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650'))
vals = np.array(np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0'))[:,0]
N = len(vals)

# some basic spatial stuff
xs = np.linspace(min(xys[:,0])-2, max(xys[:,0])+1, 10)
ys = np.linspace(min(xys[:,1])-100, max(xys[:,1])+100, 10)
xr = max(xys[:,0]) - min(xys[:,0])  # ranges so distances can weight x and y equally
yr = max(xys[:,1]) - min(xys[:,1])
X, Y = np.meshgrid(xs, ys)    # meshgrid for contour and distance calcs

# set each gridpoint to the value of the closest data point:
Z = np.zeros((len(xs), len(ys), N))
for n in range(N):
    Z[:,:,n] = ((X-xys[n,0])/xr)**2 + ((Y-xys[n,1])/yr)**2  # stack arrays of distances to each points  
z = np.argmin(Z, axis=2)   # which data point is the closest to each grid point
v = vals[z]                # set the grid value to the data point value

# do the contour plot (use only the level 0.5 since values are 0 and 1)
plt.contour(X, Y, v, cmap=plt.cm.gray, levels=[.5])  # contour the data point values

# now plot the data points
pos=np.where(vals==1)
neg=np.where(vals==0)

plt.plot(xys[pos,0], xys[pos,1], 'ro')
plt.plot(xys[neg,0], xys[neg,1], 'bo')

plt.show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.