pytorch on colab "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"

Question

I've been running a resnet50 pytorch script on colab for nine months. I haven't run the script for about three weeks and I get the following error now: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

My script barfs on this colab cell:

# Train the model
if IS_TRAINING:
    # Create the model
    mymodel = torchvision.models.resnet50(weights='ResNet50_Weights.DEFAULT')
    n_features = mymodel.fc.in_features

# Replace the last layer by our own Linear layer
mymodel.fc = DisMaxLossFirstPart(n_features, len(class_names))
mymodel = mymodel.to(device)

criterion = DisMaxLossSecondPart(mymodel.fc)
optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)

mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
    mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
)`

With this error:

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%
97.8M/97.8M [00:00<00:00, 308MB/s]
Epoch 0/3
----------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-22-01c27b985121> in <module>
     12     optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
     13 
---> 14     mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
     15         mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
     16     )

2 frames
<ipython-input-19-c213dabd46bb> in forward(self, logits, targets, debug, precompute_thresholds)
     87         num_classes = logits.size(1)
     88         half_batch_size = batch_size//2
---> 89         targets_one_hot = torch.eye(num_classes)[targets].long().cuda()
     90 
     91         if self.model_classifier.training:

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

As described above, I am running a resnet50 saved model using pytorch on colab. The script used to run without a problem. All I have done today is change the input set of images. I've also verified this by running the last version of my script that worked for me and it barfs in the same way.

I am specifying the device to be used as follows:

device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU")

Shouldn't this be forcing everything on colab to be on the GPU?

I saw that someone else had a similar problem with YOLOv7 but I don't have any code that is "from_which_layer.append((torch.ones(size=(len(b),)) * i)"

Which device are targets on? it might be that you need to change that line to torch.eye(num_classes,device=targets.device).long() — jhso
– jhso, Commented Dec 23, 2022 at 6:03
I didn't realize when I originally posted that I had a device = blah blah blah line. I edited my post and added that. Here is the line: device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU") . Is this what you mean? If not, can you tell me how to find out what you are asking? — marlon1492
– marlon1492, Commented Dec 23, 2022 at 7:43
okay, I found this in my script: targets_one_hot_0 = torch.eye(num_classes)[torch.roll(targets[half_batch_size:], 0, 0)].long().cuda() I would have thought that since it is using .long().cuda() that this would do the right thing. — marlon1492
– marlon1492, Commented Dec 23, 2022 at 7:52
jhso's recommendation appears to have fixed this problem. I explicitly added device="cuda" inside of the torch.eye() — marlon1492
– marlon1492, Commented Dec 23, 2022 at 21:50

marlon1492 · Accepted Answer · 2022-12-24 20:48:08Z

1

As jhso said, I should enter the device in this line:

    targets_one_hot = torch.eye(num_classes)[targets].long().cuda()

so I changed it to this:

    targets_one_hot = torch.eye(num_classes, device="cuda")[targets].long().cuda()

And that works.

answered Dec 24, 2022 at 20:48

marlon1492

215 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pytorch on colab "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related