1

I am trying to fit an SVR model to my dataset and view the plot using Sklearn in Python.

from sklearn.svm import SVR
#Load Data
X_train_Occ = pd.DataFrame(X_train['occupancy'])
Y_train_Occ = Y_train
#Rescale
sc_X = StandardScaler()
sc_Y = StandardScaler()
X_train_Occ_scaled = sc_X.fit_transform(X_train_Occ)
Y_train_Occ_scaled = sc_Y.fit_transform(Y_train_Occ.reshape(-1, 1))

regressor = SVR(kernel ='rbf')
regressor.fit(X_train_Occ_scaled, Y_train_Occ_scaled)

I load my data into X and Y dataframes and scale them. see the plot below:

enter image description here

I then get the following output:

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

Then I try to show the results of the regression with this:

plt.scatter(X_train_Occ_scaled, Y_train_Occ_scaled, color = 'red')
plt.plot(X_train_Occ_scaled, regressor.predict(X_train_Occ_scaled), color = 'blue')
plt.title('Occupancy vs Flow (SVR)')
plt.xlabel('Occupancy')
plt.ylabel('Flow')
plt.show()

Which gives the following plot:

enter image description here

Has the model over-fitted to the data? or is there something wrong with the code?

I am following the code from here: http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html

I am trying to draw the line of best fit with the model, not a line from each point.

3
  • What kind of plot did you expect? What you see is what you get when using plt.plot() with inputs non conform with the assumptions. Read matplotlib's docs on what plot really does (it plots lines between neighbors in your inputs; basically). Maybe you want plt.scatter() or something else. Recommendation: remove one tag (e.g. non-linear reg) and replace it with matplotlib (one of the more important tags for this question). Commented Apr 11, 2018 at 14:47
  • Thanks I have updated my question. @sascha Commented Apr 11, 2018 at 15:18
  • 1
    (Super-short look at the reference-code:) The reason your code fails and the referenced does not is that their x is sorted. This will all get very clear and simple after reading what matplotlib's plotting-functions do! Use numpy's argsort to sort x and y in parallel and plot again. Commented Apr 11, 2018 at 15:20

2 Answers 2

2

As previously answered, the solution was to sort the data first by the independent variable, then fit the data to the model and predict the outcome.

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

How did you sort the data? I had the same problem
0

Do not use plt.plot since all the data are randomly ordered. use plt.scatter or rank the data from min to max first

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.