0

Hi I have a project where I need to create a convolutional autoencoder trained on the MNIST database, but my constraint is that I must not use pooling. My embedding dim is 16 and I need to have a 256 * 16 * 1 * 1 tensor as output of my encoder.

I have written the following class to define my encoder :

class AutoEncoderCNN(nn.Module):
def __init__(self,nb_channels, embedding_dim):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(16, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU()
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(16, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )

def encode(self, x):
    
    x = self.encoder(x)# A COMPLETER
    return x
        
def decode(self, x):
    x = self.decoder(x)# A COMPLETER
    return x
        
def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

But I have this dimension error when I try to train my network :

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 256, 28, 28] to have 1 channels, but got 256 channels instead

My loss function :

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')

My optimize :

optimizer =  optim.Adam(modelcnn.parameters(), lr=learning_rate)

My dataloader :

mnistTrainLoader = DataLoader(mnistTrainSet_clean, batch_size=batch_size,shuffle=True, num_workers=0)

My train loop :

 # Procédure d'entrainement du model, en utilisant un dataloader, un optimiseur et le nombre d'époques
def train(model, data_loader, opt, n_epochs):
losses = []  
i=0
for epoch in range(n_epochs):  # Boucle sur les époques
    running_loss = 0.0

    for features, labels in data_loader:      

        # A COMPLETER
        #Propagation en avant
        labels_pred = model(features) # Equivalent à model.forward(features)
         

        #Calcul du coût
        loss = loss_function(labels_pred,labels)

        #on sauvegarde la loss pour affichage futur
        losses.append(loss.item())
        
        #Effacer les gradients précédents
        optimizer.zero_grad()

        #Calcul des gradients (rétro-propagation)
        loss.backward()

        # Mise à jour des poids : un pas de l'optimiseur
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 10 == 9:    
            print('[Epoque : %d, iteration: %5d] loss: %.3f'%
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0
        i+=1   

print('Entrainement terminé')
return losses

I have tried many things to solve it but nothing work. Anyone can help me please ?

4
  • Hey, could you post more code? Of your dataloader and train loop? Commented Jan 24, 2023 at 10:47
  • 1
    How do you load your dataset, how do you pass the data? It could be that you are not using batches and thereby are lacking a dimension. Atm unrelated: also check out the last conv layer of the encoder. Commented Jan 24, 2023 at 10:48
  • @Daraan I have modified my topic you can see how i load my dataset Commented Jan 24, 2023 at 11:36
  • @TheodorPeifer I have corrected my question you can see these informations now Commented Jan 24, 2023 at 11:37

1 Answer 1

1

In the encoder, you're repeating:

nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU(),
nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU()

Just delete the duplication, and shapes will fit.

Note: As output of your encoder you'll have a shape of batch_size * 256 * h' * w'. 256 is the number of channels as output of the last convolution in the encoder, and h', w' will depend on the size of the input image h, w after passing through convolutional layers.

You're using nb_channels, and embedding_dim nowhere. And I can't see what you mean by embedding_dim since you're only using convolutions and no connecter layers.

===========EDIT===========

after dialog in down comments, I'll let this code here to inspire you -I hope- (and tell me if it works)

from torch import nn
import torch
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

data = datasets.MNIST(root='data', train=True, download=True, transform=ToTensor())

class AutoEncoderCNN(nn.Module):
  def __init__(self):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )
          
  def forward(self, x):
      x = self.encoder(x)
      x = self.decoder(x)
      return x
  
model = AutoEncoderCNN()
mnistTrainLoader = DataLoader(data,
                              batch_size=32, shuffle=True, num_workers=0)

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')
optimizer =  torch.optim.Adam(model.parameters(), lr=1e-3)
losses = []
i = 0
running_loss = .0
for epoch in range(100):
  for features, _ in mnistTrainLoader:
    y = model(features)
    loss = loss_function(y, features)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    running_loss += loss.item()
    if i % 10 == 9:    
        print('[Epoque : %d, iteration: %5d] loss: %.3f'%
              (epoch + 1, i + 1, running_loss / 10))
        running_loss = 0.0
    i+=1   

=======Adding a channel dimension=======

The problem was actually while creating the dataset, since the dataset contains greyscale images, the PyTorch MNIST dataset helper is returning the image without the dimension of channels. Convolutions need this dimension, so we need to add it.

Instead of loading dataset this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()).data
print(X_train.shape) # torch.Size([60000, 28, 28])

We load it this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:]/255.
# /255. to have floats between 0 and 1 instead of unsigned int
print(X_train.shape) # torch.Size([60000, 1, 28, 28])

Another way to handle this problem is in the model class, by adding the channel dimension to the input x.

Sign up to request clarification or add additional context in comments.

16 Comments

I have corrected this, but the error is now : RuntimeError: Given groups=1, weight of size [16, 1, 5, 5], expected input[1, 256, 28, 28] to have 1 channels, but got 256 channels instead embedding dim represent the latent space of my network.
Can you please print(features.shape) before passing them to model(features) ?
add transform to: torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor()) and tell me if it fixes it, in the other case give me the feautres.shape
You have to load data this way to add the channel dimension: X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:].float() X_test = torchvision.datasets.MNIST(root='./data', train=False, download=True).data[:,None,:,:].float() If you load data this way and use the model as presented, it will works. I've just tried it with your notebook (you have to delete the flattening in the forward). You should reconsider the way you're displayin images since we added a dimension. What you were using as X_train[i] should be now X_train[i][0]
You have to /255. since you used sigmoid in your model. So: X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:]/255. X_test = torchvision.datasets.MNIST(root='./data', train=False, download=True).data[:,None,:,:]/255.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.