2

I am experimenting with a neural network (PyTorch) and I get this error.

RuntimeError: invalid argument 2: size '[32 x 9216]' is invalid for input with 8192 elements at /pytorch/aten/src/TH/THStorage.cpp:84

My task is about image classification with AlexNet and I have backtracked the error to be the size of the images supplied to the neural network. My question is, given the network architecture with its parameters, how does one determine the correct image size required by the network?

As per my code below, I first transform the training images before feeding into the neural network. But I noticed the neural network can only accept the size of 224 and or else it gives the error above. For instance, my instinct was to apply transforms.RandomResizedCrop of size 64 but apparently this is wrong. Is there a formula to determine the size required?

Code

# transformation to be done on images
transform_train = transforms.Compose([
    transforms.RandomResizedCrop(64),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

class AlexNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x

2 Answers 2

2

I have figured out the algorithm of getting the right input size.

Out = float(((W−F+2P)/S)+1)

where

  • Out = Output shape
  • W = Image volume size (image size)
  • F = Receptive field (filter size)
  • P = Padding
  • S = Stride

Factoring in the given network hyperparameters,

The require Image size I need would be

W = (55 - 1) * 4 - 2(2) + 11
  =  223
  ⩰  224
Sign up to request clarification or add additional context in comments.

Comments

1

The actual formula to calculate the output shape after convolution layer is:

out_size= floor((in_size + 2p -f)/s + 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.