0

I've been running a resnet50 pytorch script on colab for nine months. I haven't run the script for about three weeks and I get the following error now: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

My script barfs on this colab cell:

# Train the model
if IS_TRAINING:
    # Create the model
    mymodel = torchvision.models.resnet50(weights='ResNet50_Weights.DEFAULT')
    n_features = mymodel.fc.in_features

# Replace the last layer by our own Linear layer
mymodel.fc = DisMaxLossFirstPart(n_features, len(class_names))
mymodel = mymodel.to(device)

criterion = DisMaxLossSecondPart(mymodel.fc)
optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)

mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
    mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
)`

With this error:

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%
97.8M/97.8M [00:00<00:00, 308MB/s]
Epoch 0/3
----------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-22-01c27b985121> in <module>
     12     optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
     13 
---> 14     mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
     15         mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
     16     )

2 frames
<ipython-input-19-c213dabd46bb> in forward(self, logits, targets, debug, precompute_thresholds)
     87         num_classes = logits.size(1)
     88         half_batch_size = batch_size//2
---> 89         targets_one_hot = torch.eye(num_classes)[targets].long().cuda()
     90 
     91         if self.model_classifier.training:

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

As described above, I am running a resnet50 saved model using pytorch on colab. The script used to run without a problem. All I have done today is change the input set of images. I've also verified this by running the last version of my script that worked for me and it barfs in the same way.

I am specifying the device to be used as follows:

device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU")

Shouldn't this be forcing everything on colab to be on the GPU?

I saw that someone else had a similar problem with YOLOv7 but I don't have any code that is "from_which_layer.append((torch.ones(size=(len(b),)) * i)"

4
  • 1
    Which device are targets on? it might be that you need to change that line to torch.eye(num_classes,device=targets.device).long() Commented Dec 23, 2022 at 6:03
  • I didn't realize when I originally posted that I had a device = blah blah blah line. I edited my post and added that. Here is the line: device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU") . Is this what you mean? If not, can you tell me how to find out what you are asking? Commented Dec 23, 2022 at 7:43
  • okay, I found this in my script: targets_one_hot_0 = torch.eye(num_classes)[torch.roll(targets[half_batch_size:], 0, 0)].long().cuda() I would have thought that since it is using .long().cuda() that this would do the right thing. Commented Dec 23, 2022 at 7:52
  • jhso's recommendation appears to have fixed this problem. I explicitly added device="cuda" inside of the torch.eye() Commented Dec 23, 2022 at 21:50

1 Answer 1

1

As jhso said, I should enter the device in this line:

    targets_one_hot = torch.eye(num_classes)[targets].long().cuda()

so I changed it to this:

    targets_one_hot = torch.eye(num_classes, device="cuda")[targets].long().cuda()

And that works.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.