Pytorch convolutional Autoencoder

Question

Hi I have a project where I need to create a convolutional autoencoder trained on the MNIST database, but my constraint is that I must not use pooling. My embedding dim is 16 and I need to have a 256 * 16 * 1 * 1 tensor as output of my encoder.

I have written the following class to define my encoder :

class AutoEncoderCNN(nn.Module):
def __init__(self,nb_channels, embedding_dim):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(16, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU()
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(16, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )

def encode(self, x):
    
    x = self.encoder(x)# A COMPLETER
    return x
        
def decode(self, x):
    x = self.decoder(x)# A COMPLETER
    return x
        
def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

But I have this dimension error when I try to train my network :

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 256, 28, 28] to have 1 channels, but got 256 channels instead

My loss function :

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')

My optimize :

optimizer =  optim.Adam(modelcnn.parameters(), lr=learning_rate)

My dataloader :

mnistTrainLoader = DataLoader(mnistTrainSet_clean, batch_size=batch_size,shuffle=True, num_workers=0)

My train loop :

 # Procédure d'entrainement du model, en utilisant un dataloader, un optimiseur et le nombre d'époques
def train(model, data_loader, opt, n_epochs):
losses = []  
i=0
for epoch in range(n_epochs):  # Boucle sur les époques
    running_loss = 0.0

    for features, labels in data_loader:      

        # A COMPLETER
        #Propagation en avant
        labels_pred = model(features) # Equivalent à model.forward(features)
         

        #Calcul du coût
        loss = loss_function(labels_pred,labels)

        #on sauvegarde la loss pour affichage futur
        losses.append(loss.item())
        
        #Effacer les gradients précédents
        optimizer.zero_grad()

        #Calcul des gradients (rétro-propagation)
        loss.backward()

        # Mise à jour des poids : un pas de l'optimiseur
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 10 == 9:    
            print('[Epoque : %d, iteration: %5d] loss: %.3f'%
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0
        i+=1   

print('Entrainement terminé')
return losses

I have tried many things to solve it but nothing work. Anyone can help me please ?

Hey, could you post more code? Of your dataloader and train loop? — Theodor Peifer
– Theodor Peifer, Commented Jan 24, 2023 at 10:47
How do you load your dataset, how do you pass the data? It could be that you are not using batches and thereby are lacking a dimension. Atm unrelated: also check out the last conv layer of the encoder. — Daraan
– Daraan, Commented Jan 24, 2023 at 10:48
@Daraan I have modified my topic you can see how i load my dataset — Rayzzen
– Rayzzen, Commented Jan 24, 2023 at 11:36
@TheodorPeifer I have corrected my question you can see these informations now — Rayzzen
– Rayzzen, Commented Jan 24, 2023 at 11:37

Lahcen YAMOUN · Accepted Answer · 2023-01-24 15:30:09Z

1

In the encoder, you're repeating:

nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU(),
nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU()

Just delete the duplication, and shapes will fit.

Note: As output of your encoder you'll have a shape of batch_size * 256 * h' * w'. 256 is the number of channels as output of the last convolution in the encoder, and h', w' will depend on the size of the input image h, w after passing through convolutional layers.

You're using nb_channels, and embedding_dim nowhere. And I can't see what you mean by embedding_dim since you're only using convolutions and no connecter layers.

===========EDIT===========

after dialog in down comments, I'll let this code here to inspire you -I hope- (and tell me if it works)

from torch import nn
import torch
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

data = datasets.MNIST(root='data', train=True, download=True, transform=ToTensor())

class AutoEncoderCNN(nn.Module):
  def __init__(self):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )
          
  def forward(self, x):
      x = self.encoder(x)
      x = self.decoder(x)
      return x
  
model = AutoEncoderCNN()
mnistTrainLoader = DataLoader(data,
                              batch_size=32, shuffle=True, num_workers=0)

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')
optimizer =  torch.optim.Adam(model.parameters(), lr=1e-3)
losses = []
i = 0
running_loss = .0
for epoch in range(100):
  for features, _ in mnistTrainLoader:
    y = model(features)
    loss = loss_function(y, features)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    running_loss += loss.item()
    if i % 10 == 9:    
        print('[Epoque : %d, iteration: %5d] loss: %.3f'%
              (epoch + 1, i + 1, running_loss / 10))
        running_loss = 0.0
    i+=1

=======Adding a channel dimension=======

The problem was actually while creating the dataset, since the dataset contains greyscale images, the PyTorch MNIST dataset helper is returning the image without the dimension of channels. Convolutions need this dimension, so we need to add it.

Instead of loading dataset this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()).data
print(X_train.shape) # torch.Size([60000, 28, 28])

We load it this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:]/255.
# /255. to have floats between 0 and 1 instead of unsigned int
print(X_train.shape) # torch.Size([60000, 1, 28, 28])

Another way to handle this problem is in the model class, by adding the channel dimension to the input x.

edited Jan 24, 2023 at 15:30

answered Jan 24, 2023 at 12:33

Lahcen YAMOUN

6913 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

Rayzzen Over a year ago

I have corrected this, but the error is now :

RuntimeError: Given groups=1, weight of size [16, 1, 5, 5], expected input[1, 256, 28, 28] to have 1 channels, but got 256 channels instead

embedding dim represent the latent space of my network.

Lahcen YAMOUN Over a year ago

Can you please print(features.shape) before passing them to model(features) ?

Lahcen YAMOUN Over a year ago

add transform to: torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor()) and tell me if it fixes it, in the other case give me the feautres.shape

Lahcen YAMOUN Over a year ago

You have to load data this way to add the channel dimension:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:].float()   X_test = torchvision.datasets.MNIST(root='./data', train=False, download=True).data[:,None,:,:].float()

If you load data this way and use the model as presented, it will works. I've just tried it with your notebook (you have to delete the flattening in the forward). You should reconsider the way you're displayin images since we added a dimension. What you were using as X_train[i] should be now X_train[i][0]

Lahcen YAMOUN Over a year ago

You have to /255. since you used sigmoid in your model. So:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:]/255.   X_test = torchvision.datasets.MNIST(root='./data', train=False, download=True).data[:,None,:,:]/255.

|

Collectives™ on Stack Overflow

Pytorch convolutional Autoencoder

1 Answer 1

16 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

16 Comments

Your Answer

Sign up or log in

Post as a guest

Related