0

so I'm starting with Pytorch and tried to start with an easy Linear Regression Example. Actually I made an easy Implementation of Linear Regression with Pytorch to calculate the equation 2*x+1 but the loss stay stuck at 120 and there is a Problem with Gradient Descent because it doesn't converge to a small loss value. I don't know why this is happening and it made me crazy because I don't see what's wrong. actually this example should be very easy to solve. this is the Code I'm using

import torch
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np

X = np.array([i for i in np.arange(1, 20)]).reshape(-1, 1)
X = torch.tensor(X, dtype=torch.float32, requires_grad=True)
y = np.array([2*i+1 for i in np.arange(1, 20)]).reshape(-1, 1)
y = torch.tensor(y, dtype=torch.float32, requires_grad=True)
print(X.shape, y.shape)

class LR(torch.nn.Module):
    def __init__(self, n_features, n_hidden1, n_out):
        super(LR, self).__init__()
        self.linear = torch.nn.Linear(n_features, n_hidden1)

        self.predict = torch.nn.Linear(n_hidden1, n_out)

    def forward(self, x):
        x = F.relu(self.linear(x))
        x = self.predict(x)
        return x

model = LR(1, 10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = torch.nn.MSELoss()

def train(epochs=100):
    for e in range(epochs):

        pred = model(X)
        loss = loss_fn(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f"epoch: {e} and loss= {loss}")

desired output is a small loss value and that the model train to give a good prediction later.

1 Answer 1

1

Your learning rate is too large. The model takes a few steps in the right direction, but it can't land on an actually good minimizer and henceforth zigzags around it. If you try lr=0.001 instead, your performance will be much better. This is why it's often useful to decay your learning rate over time when using first order optimizers.

Sign up to request clarification or add additional context in comments.

2 Comments

it's wierd, I tried yesterday to minimize the learning rate but I didn't get a good result too, and now it works! can you tell me what do you specifically mean with first order optimizer ?
A first order optimizer like gradient descent only uses first order derivative information when taking a step. You compare this to a second order solver like newton's method, which calculates the hessian. The latter takes better steps but scales more poorly to high dimensional spaces. Consider this link, for example: math.stackexchange.com/questions/2201384/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.