5

I made my windows 10 jupyter notebook as a server and running some trains on it.

I've installed CUDA 9.0 and cuDNN properly, and python detects the GPU. This is what I've got on the anaconda prompt.

>>> torch.cuda.get_device_name(0)
'GeForce GTX 1070'

And I also placed my model and tensors on cuda by .cuda()

model = LogPPredictor(1, 58, 64, 128, 1, 'gsc')

if torch.cuda.is_available():
    torch.set_default_tensor_type(torch.cuda.DoubleTensor)
    model.cuda()
else:
    torch.set_default_tensor_type(torch.FloatTensor)

list_train_loss = list()
list_val_loss = list()
acc = 0
mse = 0

optimizer = args.optim(model.parameters(),
                       lr=args.lr,
                       weight_decay=args.l2_coef)

data_train = DataLoader(args.dict_partition['train'], 
                        batch_size=args.batch_size,
                        pin_memory=True,
                        shuffle=args.shuffle)

data_val = DataLoader(args.dict_partition['val'],
                     batch_size=args.batch_size,
                     pin_memory=True,
                     shuffle=args.shuffle)

for epoch in tqdm_notebook(range(args.epoch), desc='Epoch'):
    model.train()
    epoch_train_loss = 0
    for i, batch in enumerate(data_train):
        list_feature = torch.tensor(batch[0]).cuda()
        list_adj = torch.tensor(batch[1]).cuda()
        list_logP = torch.tensor(batch[2]).cuda()
        list_logP = list_logP.view(-1,1)

        optimizer.zero_grad()
        list_pred_logP = model(list_feature, list_adj)
        list_pred_logP.require_grad = False
        train_loss = args.criterion(list_pred_logP, list_logP)
        epoch_train_loss += train_loss.item()
        train_loss.backward()
        optimizer.step()

    list_train_loss.append(epoch_train_loss/len(data_train))

    model.eval()
    epoch_val_loss = 0
    with torch.no_grad():
        for i, batch in enumerate(data_val):
            list_feature = torch.tensor(batch[0]).cuda()
            list_adj = torch.tensor(batch[1]).cuda()
            list_logP = torch.tensor(batch[2]).cuda()
            list_logP = list_logP.view(-1,1)


            list_pred_logP = model(list_feature, list_adj)
            val_loss = args.criterion(list_pred_logP, list_logP)
            epoch_val_loss += val_loss.item()

    list_val_loss.append(epoch_val_loss/len(data_val))

data_test = DataLoader(args.dict_partition['test'],
                   batch_size=args.batch_size,
                   pin_memory=True,
                   shuffle=args.shuffle)

model.eval()
with torch.no_grad():
    logP_total = list()
    pred_logP_total = list()
    for i, batch in enumerate(data_val):
        list_feature = torch.tensor(batch[0]).cuda()
        list_adj = torch.tensor(batch[1]).cuda()
        list_logP = torch.tensor(batch[2]).cuda()
        logP_total += list_logP.tolist()
        list_logP = list_logP.view(-1,1)


    list_pred_logP = model(list_feature, list_adj)

    pred_logP_total += list_pred_logP.tolist()

mse = mean_squared_error(logP_total, pred_logP_total)

But on the Process Manager of Windows, whenever I start training, only CPU usage goes up to 25% and GPU usage remains 0. How can I fix this???

5
  • Ive installed pytorch then cuda and cudnn Do i have to reinstall pytorch on my conda environment? Commented Oct 29, 2018 at 11:30
  • I've reinstalled the pytorch but nothing changes Commented Oct 29, 2018 at 17:27
  • 2
    did you find a solution to this? I'm having the same problem Commented Jan 6, 2019 at 1:53
  • 2
    I also have the same problem. Isn't there any solution for this? Commented Oct 30, 2019 at 21:46
  • I have the same problem. Any updates? Commented Oct 3, 2020 at 3:19

1 Answer 1

1

I had a similar problem with using PyTorch on Cuda. After looking for possible solutions, I found the following post by Soumith himself that found it very helpful.

https://discuss.pytorch.org/t/gpu-supposed-to-be-used-but-isnt/2883

The bottom line is, at least in my case, I could not put enough load on GPUs. There was a bottleneck in my application. Try another example, or increase batch size; it should be OK.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.