training and evaluating an stacked auto-encoder model in pytorch

Question

I am trying to train a model in pytorch.

input: 686-array first layer: 64-array second layer: 2-array output: predition either 1 or 0

this is what I have so far:

class autoencoder(nn.Module):
    def __init__(self):
        super(autoencoder, self).__init__()
        self.encoder_softmax = nn.Sequential(
            nn.Linear(686, 256),
            nn.ReLU(True),
            nn.Linear(256, 2),
            nn.Softmax()
        )

    def forward(self, x):
        x = self.encoder_softmax(x)
        return x

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

net = net.to(device)


iterations = 10
learning_rate = 0.98
criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(
    net.parameters(), lr=learning_rate, weight_decay=1e-5)


for epoch in range(iterations):
    loss = 0.0
    print("train_dl len: ", len(train_dl))

    # net.train()
    for i, data in enumerate(train_dl, 0):
        inputs, labels, vectorize = data

        labels = labels.long().to(device)
        inputs = inputs.float().to(device)
        optimizer.zero_grad()
        outputs = net(inputs)

        train_loss = criterion(outputs, labels)

        train_loss.backward()
        optimizer.step()

        loss += train_loss.item()


    loss = loss / len(train_dl)

but when I train the model, the loss is not going down. What am I doing wrong?

did you try reducing the learning rate? how big is your training dataset? — Wasi Ahmad
– Wasi Ahmad, Commented May 16, 2020 at 13:27

Michael Jungo · Accepted Answer · 2020-05-16 15:25:49Z

4

You're using nn.CrossEntropyLoss as the loss function, which applies log-softmax, but you also apply softmax in the model:

self.encoder_softmax = nn.Sequential(
    nn.Linear(686, 256),
    nn.ReLU(True),
    nn.Linear(256, 2),
    nn.Softmax() # <- needs to be removed
)

The output of your model should be the raw logits, without the nn.Softmax.

You should also lower the learning rate, because a learning rate of 0.98 is very high, which makes the training much less stable and you'll likely see the loss oscillate. Are more appropriate learning rate would be in the magnitude of 0.01 or 0.001.

answered May 16, 2020 at 15:25

Michael Jungo

33.2k4 gold badges97 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

training and evaluating an stacked auto-encoder model in pytorch

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related