I am trying to fit an SVR model to my dataset and view the plot using Sklearn in Python.
from sklearn.svm import SVR
#Load Data
X_train_Occ = pd.DataFrame(X_train['occupancy'])
Y_train_Occ = Y_train
#Rescale
sc_X = StandardScaler()
sc_Y = StandardScaler()
X_train_Occ_scaled = sc_X.fit_transform(X_train_Occ)
Y_train_Occ_scaled = sc_Y.fit_transform(Y_train_Occ.reshape(-1, 1))
regressor = SVR(kernel ='rbf')
regressor.fit(X_train_Occ_scaled, Y_train_Occ_scaled)
I load my data into X and Y dataframes and scale them. see the plot below:
I then get the following output:
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
Then I try to show the results of the regression with this:
plt.scatter(X_train_Occ_scaled, Y_train_Occ_scaled, color = 'red')
plt.plot(X_train_Occ_scaled, regressor.predict(X_train_Occ_scaled), color = 'blue')
plt.title('Occupancy vs Flow (SVR)')
plt.xlabel('Occupancy')
plt.ylabel('Flow')
plt.show()
Which gives the following plot:
Has the model over-fitted to the data? or is there something wrong with the code?
I am following the code from here: http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html
I am trying to draw the line of best fit with the model, not a line from each point.



plt.plot()with inputs non conform with the assumptions. Read matplotlib's docs on what plot really does (it plots lines between neighbors in your inputs; basically). Maybe you wantplt.scatter()or something else. Recommendation: remove one tag (e.g. non-linear reg) and replace it with matplotlib (one of the more important tags for this question).