0

I have an issue with this linear regression model. The scatter plot shows data points well into the negative when no negative values are within the data set. I've checked the shapes and minimum values and the graph should not be showing these negative values but I cannot figure out why the scatter plot suggests they are present.

Code for the metrics definition:

def evaluate_model(y_test, price_pred):
    gradient = price_linear.coef_
    intercept = price_linear.intercept_

    mile_mae = mean_absolute_error(y_test, price_pred)
    mile_mse = mean_squared_error(y_test, price_pred)
    mile_rmse = np.sqrt(mile_mse)
    mile_r2 = r2_score(y_test, price_pred)

    print(f'Gradient: {gradient}\n')
    print(f' Intercept: {intercept}')

    print(f' Mean absolute error: {mile_mae})')
    print(f' Mean squared error: {mile_mse}')
    print(f' Root mean squared error: {mile_rmse}')
    print(f' Coefficient of determination: {mile_r2}')

Code for the linear regression model

numerical_inputs = ['Mileage', 'Year of manufacture', 'Engine size']

x = df[numerical_inputs]
y = df['Price']

# splitting of the data 
x_num_train, x_num_test, y_price_train, y_price_test =
train_test_split(x, y, test_size
=0.2, random_state=42)

# scaling the numerical data 
scale = StandardScaler()

# fitting only to train data to prevent data leakage 
scale.fit(x_num_train)
num_train_scaled = scale.transform(x_num_train)
num_test_scaled = scale.transform(x_num_test)

multi_price_linear = LinearRegression()

multi_price_linear.fit(num_train_scaled, y_price_train)

multi_price_pred = multi_price_linear.predict(num_test_scaled)

evaluate_model(y_price_test, multi_price_pred)

# plt.show()
plt.figure(figsize=(14, 8))
plt.scatter(y_price_test, multi_price_pred, alpha=0.6)
plt.plot([min(y_price_test), max(y_price_test)],
         [min(y_price_test), max(y_price_test)], color='red')
plt.ylabel('Actual Price')
plt.xlabel('Predicted Price')
plt.title('Predicted Price vs Actual Price')
plt.show()

Which results in the following output:

Gradient: [-2720.41736808  9520.41488938  6594.02448017] 
 Intercept: 13854.628699999997

 Mean absolute error: 6091.458141656242 
 Mean squared error: 89158615.76017143 
 Root mean squared error: 9442.38400829851 
 Coefficient of determination: 0.671456306417368

Here is an image of the scatter plot:

Scatter plot with negative values

I don't want to limit the graph to showing the negative values if this indicates some issue with the data or code. Thank you! Here you can find the full version of my code google code

3
  • Please fix your indentation, add imports, and include enough data to reproduce the problem. Commented Dec 13, 2024 at 13:34
  • @jared I have updated and added the full code file Commented Dec 13, 2024 at 14:18
  • Your plot (in colab, not above where the label is wrong) shows that when the real price is small (not far away to 0), the predicted price can be sometimes negative and sometimes positive. It could be possible as the LR is an approximate model. Have a look at the rows where the predicted price is negative to see if there is something wrong. Commented Dec 13, 2024 at 15:51

1 Answer 1

1

The answer is you accidentally switched the axes labels. Your predicted values are plotted on the Y axis and your actual values are plotted on the X axis.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I tried the switch around but its still showing the negative values
make sure you're only switching either the labels or data, not both. I.e: plt.scatter(multi_price_pred, y_price_test, alpha = 0.6) plt.ylabel('Actual Price') plt.xlabel('Predicted Price')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.