I've been running a resnet50 pytorch script on colab for nine months. I haven't run the script for about three weeks and I get the following error now: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
My script barfs on this colab cell:
# Train the model
if IS_TRAINING:
# Create the model
mymodel = torchvision.models.resnet50(weights='ResNet50_Weights.DEFAULT')
n_features = mymodel.fc.in_features
# Replace the last layer by our own Linear layer
mymodel.fc = DisMaxLossFirstPart(n_features, len(class_names))
mymodel = mymodel.to(device)
criterion = DisMaxLossSecondPart(mymodel.fc)
optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
)`
With this error:
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%
97.8M/97.8M [00:00<00:00, 308MB/s]
Epoch 0/3
----------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-22-01c27b985121> in <module>
12 optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
13
---> 14 mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
15 mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
16 )
2 frames
<ipython-input-19-c213dabd46bb> in forward(self, logits, targets, debug, precompute_thresholds)
87 num_classes = logits.size(1)
88 half_batch_size = batch_size//2
---> 89 targets_one_hot = torch.eye(num_classes)[targets].long().cuda()
90
91 if self.model_classifier.training:
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
As described above, I am running a resnet50 saved model using pytorch on colab. The script used to run without a problem. All I have done today is change the input set of images. I've also verified this by running the last version of my script that worked for me and it barfs in the same way.
I am specifying the device to be used as follows:
device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU")
Shouldn't this be forcing everything on colab to be on the GPU?
I saw that someone else had a similar problem with YOLOv7 but I don't have any code that is "from_which_layer.append((torch.ones(size=(len(b),)) * i)"
targetson? it might be that you need to change that line totorch.eye(num_classes,device=targets.device).long()