why Gradient Descent doesn't work as expected with pytorch

Question

so I'm starting with Pytorch and tried to start with an easy Linear Regression Example. Actually I made an easy Implementation of Linear Regression with Pytorch to calculate the equation 2*x+1 but the loss stay stuck at 120 and there is a Problem with Gradient Descent because it doesn't converge to a small loss value. I don't know why this is happening and it made me crazy because I don't see what's wrong. actually this example should be very easy to solve. this is the Code I'm using

import torch
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np

X = np.array([i for i in np.arange(1, 20)]).reshape(-1, 1)
X = torch.tensor(X, dtype=torch.float32, requires_grad=True)
y = np.array([2*i+1 for i in np.arange(1, 20)]).reshape(-1, 1)
y = torch.tensor(y, dtype=torch.float32, requires_grad=True)
print(X.shape, y.shape)

class LR(torch.nn.Module):
    def __init__(self, n_features, n_hidden1, n_out):
        super(LR, self).__init__()
        self.linear = torch.nn.Linear(n_features, n_hidden1)

        self.predict = torch.nn.Linear(n_hidden1, n_out)

    def forward(self, x):
        x = F.relu(self.linear(x))
        x = self.predict(x)
        return x

model = LR(1, 10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = torch.nn.MSELoss()

def train(epochs=100):
    for e in range(epochs):

        pred = model(X)
        loss = loss_fn(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f"epoch: {e} and loss= {loss}")

desired output is a small loss value and that the model train to give a good prediction later.

ostrichgroomer · Accepted Answer · 2019-09-05 02:31:38Z

1

Your learning rate is too large. The model takes a few steps in the right direction, but it can't land on an actually good minimizer and henceforth zigzags around it. If you try lr=0.001 instead, your performance will be much better. This is why it's often useful to decay your learning rate over time when using first order optimizers.

answered Sep 5, 2019 at 2:31

ostrichgroomer

3543 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

basilisk Over a year ago

it's wierd, I tried yesterday to minimize the learning rate but I didn't get a good result too, and now it works! can you tell me what do you specifically mean with first order optimizer ?

ostrichgroomer Over a year ago

A first order optimizer like gradient descent only uses first order derivative information when taking a step. You compare this to a second order solver like newton's method, which calculates the hessian. The latter takes better steps but scales more poorly to high dimensional spaces. Consider this link, for example: math.stackexchange.com/questions/2201384/…

Collectives™ on Stack Overflow

why Gradient Descent doesn't work as expected with pytorch

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related