1

I am trying to train a model in pytorch.

input: 686-array first layer: 64-array second layer: 2-array output: predition either 1 or 0

this is what I have so far:

class autoencoder(nn.Module):
    def __init__(self):
        super(autoencoder, self).__init__()
        self.encoder_softmax = nn.Sequential(
            nn.Linear(686, 256),
            nn.ReLU(True),
            nn.Linear(256, 2),
            nn.Softmax()
        )

    def forward(self, x):
        x = self.encoder_softmax(x)
        return x

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

net = net.to(device)


iterations = 10
learning_rate = 0.98
criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(
    net.parameters(), lr=learning_rate, weight_decay=1e-5)


for epoch in range(iterations):
    loss = 0.0
    print("train_dl len: ", len(train_dl))

    # net.train()
    for i, data in enumerate(train_dl, 0):
        inputs, labels, vectorize = data

        labels = labels.long().to(device)
        inputs = inputs.float().to(device)
        optimizer.zero_grad()
        outputs = net(inputs)

        train_loss = criterion(outputs, labels)

        train_loss.backward()
        optimizer.step()

        loss += train_loss.item()


    loss = loss / len(train_dl)

but when I train the model, the loss is not going down. What am I doing wrong?

1
  • 2
    did you try reducing the learning rate? how big is your training dataset? Commented May 16, 2020 at 13:27

1 Answer 1

4

You're using nn.CrossEntropyLoss as the loss function, which applies log-softmax, but you also apply softmax in the model:

self.encoder_softmax = nn.Sequential(
    nn.Linear(686, 256),
    nn.ReLU(True),
    nn.Linear(256, 2),
    nn.Softmax() # <- needs to be removed
)

The output of your model should be the raw logits, without the nn.Softmax.

You should also lower the learning rate, because a learning rate of 0.98 is very high, which makes the training much less stable and you'll likely see the loss oscillate. Are more appropriate learning rate would be in the magnitude of 0.01 or 0.001.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.