How to fix SVR plot in Python sklearn

Question

I am trying to fit an SVR model to my dataset and view the plot using Sklearn in Python.

from sklearn.svm import SVR
#Load Data
X_train_Occ = pd.DataFrame(X_train['occupancy'])
Y_train_Occ = Y_train
#Rescale
sc_X = StandardScaler()
sc_Y = StandardScaler()
X_train_Occ_scaled = sc_X.fit_transform(X_train_Occ)
Y_train_Occ_scaled = sc_Y.fit_transform(Y_train_Occ.reshape(-1, 1))

regressor = SVR(kernel ='rbf')
regressor.fit(X_train_Occ_scaled, Y_train_Occ_scaled)

I load my data into X and Y dataframes and scale them. see the plot below:

I then get the following output:

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

Then I try to show the results of the regression with this:

plt.scatter(X_train_Occ_scaled, Y_train_Occ_scaled, color = 'red')
plt.plot(X_train_Occ_scaled, regressor.predict(X_train_Occ_scaled), color = 'blue')
plt.title('Occupancy vs Flow (SVR)')
plt.xlabel('Occupancy')
plt.ylabel('Flow')
plt.show()

Which gives the following plot:

Has the model over-fitted to the data? or is there something wrong with the code?

I am following the code from here: http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html

I am trying to draw the line of best fit with the model, not a line from each point.

What kind of plot did you expect? What you see is what you get when using plt.plot() with inputs non conform with the assumptions. Read matplotlib's docs on what plot really does (it plots lines between neighbors in your inputs; basically). Maybe you want plt.scatter() or something else. Recommendation: remove one tag (e.g. non-linear reg) and replace it with matplotlib (one of the more important tags for this question). — sascha
– sascha, Commented Apr 11, 2018 at 14:47
(Super-short look at the reference-code:) The reason your code fails and the referenced does not is that their x is sorted. This will all get very clear and simple after reading what matplotlib's plotting-functions do! Use numpy's argsort to sort x and y in parallel and plot again. — sascha
– sascha, Commented Apr 11, 2018 at 15:20

brian4342 · Accepted Answer · 2018-04-12 12:02:11Z

2

As previously answered, the solution was to sort the data first by the independent variable, then fit the data to the model and predict the outcome.

answered Apr 12, 2018 at 12:02

brian4342

1,2638 gold badges37 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jean Over a year ago

How did you sort the data? I had the same problem

hao · Accepted Answer · 2019-10-09 13:49:37Z

0

Do not use plt.plot since all the data are randomly ordered. use plt.scatter or rank the data from min to max first

answered Oct 9, 2019 at 13:49

hao

416 bronze badges

Collectives™ on Stack Overflow

How to fix SVR plot in Python sklearn

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related