linear regression: my plotting doesn't show the line

Question

I am working on implementing from scratch a linear regression model means without using Sklearn package.

all was working just fine , until i tried ploting the result.

my fit line isn't showing:

i looked at a bunch of solution but neither of them was for myy problem

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('ignore')
data = pd.read_csv(r'C:\Salary.csv')

x=data['Salary']

y=data['YearsExperience']


#y= mx+b

m = 0
b = 0

Learning_Rate = .01
epochs = 5000

n = np.float(x.shape[0])
error = []

for i in range(epochs):
   Y_hat = m*x+b

#error
   mse= (1/n)*np.sum((y-Y_hat)**2)
   error.append(mse)

#gradient descend
   db = (-2/n) * np.sum(x*(y-Y_hat))
   dm = (-2/n) * np.sum((y-Y_hat))

   m = m - Learning_Rate * dm
   b = b - Learning_Rate * db

#tracing x and y line
x_line = np.linspace(0, 15, 100)
y_line = (m*x_line)+ b



#ploting result
plt.figure(figsize=(8,6))

plt.title('LR result')
**plt.plot(x_line,y_line) #the problem is apparently here
                        # i just don't know what to do**
plt.scatter(x,y)
plt.show()

appart from that, there is no problem with the code .

Please turn off the warnings filter, look at the output/final value of x_line and y_line and you will see what's making pyplot unable to plot the line. — Raketenolli
– Raketenolli, Commented Dec 8, 2022 at 20:28
@Walid For a start, you should remove that horrible line warnings.filterwarnings('ignore'). You should replace np.float by float, as indicated by the warning. Then, you should test your code with just a few epochs, e.g. 10. You should also start to debug your for loop, e.g. by printing the values of m and b at each step. — JohanC
– JohanC, Commented Dec 8, 2022 at 20:54

Zephyr · Accepted Answer · 2022-12-08 21:01:04Z

Your code has multiple problems:

you are plotting the line from 0 and 15, while data range from about 40000 to 140000. Even if you are correctly computing the line, you are going to plot it in a region far away from your data
in the loop there is a mistake in the computation of dm and db, they are swapped. The corrected expressions are:
```
dm = (-2/n)*np.sum(x*(y - Y_hat))
db = (-2/n)*np.sum((y - Y_hat))
```
your x and y data are on very different scales: x is ~10⁴ magnitude, while y is ~10¹. For this reason, also m and b will likely be very different from each other (different orders of magnitude). This is the reason why you should use two different learning rate for the different quantities you are optimizing: Learning_Rate_m for m and Learning_Rate_b for b
finally, the gradient descent method is strongly affected by the initial guess: it may lead to find local minima (fake solutions) in place of the global minima (true solution). For this reason, you should try with different initial guesses for m and b, possibly close to their estimated value:
```
m = 0
b = -2
```

Complete Code

import numpy as np
import matplotlib.pyplot as plt


N = 40
np.random.seed(42)
x = np.random.randint(low = 38000, high = 145000, size = N)
y = (13 - 1)/(140000 - 40000)*(x - 40000) + 1 + 0.5*np.random.randn(N)


# initial guess
m = 0
b = -2

Learning_Rate_m = 1e-10
Learning_Rate_b = 1e-2
epochs = 5000

n = np.float(x.shape[0])
error = []


for i in range(epochs):
   Y_hat = m*x + b

   mse = 1/n*np.sum((y - Y_hat)**2)
   error.append(mse)

   dm = -2/n*np.sum(x*(y - Y_hat))
   db = -2/n*np.sum((y - Y_hat))

   m = m - Learning_Rate_m*dm
   b = b - Learning_Rate_b*db


x_line = np.linspace(x.min(), x.max(), 100)
y_line = (m*x_line) + b


plt.figure(figsize=(8,6))

plt.title('LR result')
plt.plot(x_line,y_line, 'red')

plt.scatter(x,y)
plt.show()

Plot

Youssef Khemiri · Accepted Answer · 2022-12-08 20:31:15Z

2

The problem is not happening while plotting, the problem is with the parameters in plt.plot(x_line,y_line), I tested your code and found that y_line is all NaN values, double check the calculations (y_line, m, dm).

answered Dec 8, 2022 at 20:31

Youssef Khemiri

791 silver badge9 bronze badges

5 Comments

Walid Over a year ago

I really don't know

Walid Over a year ago

how should I do then, I used the math formula technically

Youssef Khemiri Over a year ago

Can you give me the source where you got these from ? `

for i in range(epochs):     Y_hat = m*x+b  #error    mse= (1/n)*np.sum((y-Y_hat)**2)    error.append(mse)  #gradient descend    db = (-2/n) * np.sum(x*(y-Y_hat))    dm = (-2/n) * np.sum((y-Y_hat))     m = m - Learning_Rate * dm    b = b - Learning_Rate * db

Walid Over a year ago

it was on a youtube video, from stat quest Chanel

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

linear regression: my plotting doesn't show the line

2 Answers 2

Complete Code

Plot

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Complete Code

Plot

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related