I'm a new student in AI, currently learning linear regression. I used the california housing dataset for doing my experiments. My goal is to predict the 'population' column based on the 'total_rooms' column. I used the following formula and code to compute the slope 'm' and intercept 'b'.
Formula: $$ m = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2 }$$ $$ c = \bar{y} - m \bar{x} $$
The code is as follows and it works perfectly:
# Linear regression using the above formula
x_vals = np.array(train['total_rooms'])
y_vals = np.array(train['population'])
xm = np.mean(x_vals)
ym = np.mean(y_vals)
def compute_m(x_vals, y_vals):
n = len(x_vals)
sum_xy, sum_xx = 0, 0
for i in range(n):
sum_xy += (x_vals[i]-xm) * (y_vals[i]-ym)
sum_xx += (x_vals[i]-xm)**2
return sum_xy/sum_xx
m = compute_m(x_vals, y_vals)
c = ym - m*xm
xl = np.array([np.min(x_vals), np.max(x_vals)])
yl = m*xl + c
plt.scatter(x_vals, y_vals)
plt.plot(xl,yl, 'r')
plt.show()
print('m, c:', m,c)
To verify that my code is working. I also checked it with built-in linear regression in scikit-learn and it returns the exact same answer:
# Now use scikit learn library
from sklearn.linear_model import LinearRegression
model = LinearRegression()
ans = model.fit(x_vals.reshape(-1,1),y_vals.reshape(-1,1))
score, intercept, coef = ans.score(x_vals.reshape(-1,1),y_vals.reshape(-1,1)), ans.intercept_, ans.coef_
print('results from scikit-learn:', score, intercept, coef)
But the problem arises when I try to use gradient descent for learning the slope m and intercept c. My code is as follows:
# Gradient descent
def loss_func(y, ypred):
mse = (y - ypred)**2
return mse
def gradient_loss(y, x, mc, bc):
n = len(y)
print(n)
m_loss, b_loss = 0, 0
for i in range(n):
ml = (-2/n) * x[i] * (y[i] - mc*x[i] - bc)
m_loss += ml
bl = (-2/n) * (y[i] - mc*x[i] - bc)
b_loss += bl
return m_loss, b_loss
ep = 100000
alpha = 0.000000001 # learning rate
m, b = 0, 0
for e in range(ep):
y_pred = m*x_vals + b
m_loss, b_loss = gradient_loss(y_vals, x_vals, m, b)
print("m, b, m_loss, b_loss:", m, b, m_loss, b_loss)
m = m - m_loss*alpha
b = b - b_loss*0.001
I took the derivative of the squared loss, and using learning rate $\alpha$ for computing $m$ and intercept $b$ (sorry it is the same as $c$ in the previous block) for 100000 episodes. But note that I have to use different learning rate for m and b. No matter what learning rate I use, it does not work for both m and b. If I write in the last line 'b = b - b_loss*alpha', it never converges for any value of alpha
I've never seen in any article or book to use different learning rate for slope m and intercept b. Can anybody please explain what's happening here? And why I had to use different learning rate? :(
