PyTorch RuntimeError : Gradients are not CUDA tensors

Question

I am getting the following error while doing seq to seq on characters and feeding to LSTM, and decoding to words using attention. The forward propagation is fine but while computing loss.backward() I am getting the following error.

RuntimeError: Gradients aren't CUDA tensors

My train() function is as followed.

def train(input_batch, input_batch_length, target_batch, target_batch_length, batch_size):

    # Zero gradients of both optimizers
    encoderchar_optimizer.zero_grad()
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    encoder_input = Variable(torch.FloatTensor(len(input_batch), batch_size, 500))

    for ix , w in enumerate(input_batch): 
        w = w.contiguous().view(15, batch_size)
        reshaped_input_length = [x[ix] for x in input_batch_length] # [15 ,.. 30 times] * 128
        if USE_CUDA: 
            w = w.cuda()
            #reshaped_input_length =  Variable(torch.LongTensor(reshaped_input_length)).cuda()
        hidden_all , output = encoderchar(w, reshaped_input_length)
        encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)
        if USE_CUDA: 
            encoder_input = encoder_input.cuda()

    temporary_target_batch_length = [15] * batch_size

    encoder_hidden_all, encoder_output = encoder(encoder_input, target_batch_length)
    decoder_input = Variable(torch.LongTensor([SOS_token] * batch_size))
    decoder_hidden = encoder_output

    max_target_length = max(temporary_target_batch_length)
    all_decoder_outputs = Variable(torch.zeros(max_target_length, batch_size, decoder.output_size))

    # Move new Variables to CUDA
    if USE_CUDA:
        decoder_input = decoder_input.cuda()
        all_decoder_outputs = all_decoder_outputs.cuda()
        target_batch =  target_batch.cuda()

    # Run through decoder one time step at a time
    for t in range(max_target_length):
        decoder_output, decoder_hidden, decoder_attn = decoder(
            decoder_input, decoder_hidden, encoder_hidden_all
        )

        all_decoder_outputs[t] = decoder_output
        decoder_input = target_batch[t] # Next input is current target
        if USE_CUDA:
            decoder_input = decoder_input.cuda()

    # Loss calculation and backpropagation
    loss = masked_cross_entropy(
        all_decoder_outputs.transpose(0, 1).contiguous(), # -> batch x seq
        target_batch.transpose(0, 1).contiguous(), # -> batch x seq
        target_batch_length
    )
    loss.backward()

    # Clip gradient norms
    ecc = torch.nn.utils.clip_grad_norm(encoderchar.parameters(), clip)
    ec = torch.nn.utils.clip_grad_norm(encoder.parameters(), clip)
    dc = torch.nn.utils.clip_grad_norm(decoder.parameters(), clip)

    # Update parameters with optimizers
    encoderchar_optimizer.step()
    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.data[0], ec, dc

Full Stack Trace is here.

RuntimeError                              Traceback (most recent call last)
<ipython-input-10-9778e12ded02> in <module>()
     11         data_target_batch_index= Variable(torch.LongTensor(data_target_batch_index)).transpose(0,1)
     12         # Send the data for training
---> 13         loss, ar1, ar2 = train(data_input_batch_index, data_input_batch_length, data_target_batch_index, data_target_batch_length, batch_size)
     14 
     15         # Keep track of loss

<ipython-input-8-9c71c385f8cd> in train(input_batch, input_batch_length, target_batch, target_batch_length, batch_size)
     54         target_batch_length
     55     )
---> 56     loss.backward()
     57 
     58     # Clip gradient norms

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py in backward(self, gradient, retain_variables)
    144                     'or with gradient w.r.t. the variable')
    145             gradient = self.data.new().resize_as_(self.data).fill_(1)
--> 146         self._execution_engine.run_backward((self,), (gradient,), retain_variables)
    147 
    148     def register_hook(self, hook):

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/function.py in _do_backward(self, gradients, retain_variables)
    207     def _do_backward(self, gradients, retain_variables):
    208         self.retain_variables = retain_variables
--> 209         result = super(NestedIOFunction, self)._do_backward(gradients, retain_variables)
    210         if not retain_variables:
    211             del self._nested_output

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/function.py in backward(self, *gradients)
    215     def backward(self, *gradients):
    216         nested_gradients = _unflatten(gradients, self._nested_output)
--> 217         result = self.backward_extended(*nested_gradients)
    218         return tuple(_iter_None_tensors(result))
    219 

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in backward_extended(self, grad_output, grad_hy)
    314             grad_hy,
    315             grad_input,
--> 316             grad_hx)
    317 
    318         if any(self.needs_input_grad[1:]):

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py in backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_input, grad_hx)
    371                 hidden_size, dcy.size()))
    372         if not dhy.is_cuda or not dy.is_cuda or (dcy is not None and not dcy.is_cuda):
--> 373             raise RuntimeError('Gradients aren\'t CUDA tensors')
    374 
    375         check_error(cudnn.lib.cudnnRNNBackwardData(

RuntimeError: Gradients aren't CUDA tensors

any suggestions about why I am doing wrong?

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

Make sure that all the objects that inherit nn.Module also call their .cuda(). Make sure to call before you pass any tensor to them. (essentially before training)

For example, (and I am guessing your encoder and decoder are such objects), do this right before you call train().

encoder = encoder.cuda()
decoder = decoder.cuda()

This ensures that all of the model's parameters are initialized in cuda memory.

Edit

In general, whenever you have this kind of error,

RuntimeError: Gradients aren't CUDA tensors

somewhere, (from your model creation, to defining inputs, to finally supplying the outputs to the loss function) you missed specifying a Variable object to be in GPU memory. You will have go through every step in your model, verifying all Variable objects to be in GPU memory.

Additionally, you dont have to call .cuda() on the outputs. Given that the inputs are in gpu's memory, all operations also takes place in gpu's memory, and so are your outputs.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 20, 2017 at 7:03

entrophy

2,12516 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Satish K Cheekala Over a year ago

I'm doing that.

# Initialize models encoderchar =EncodercharRNN(34, hidden_size, n_layers) encoder =  EncoderRNN(hidden_size, hidden_size , n_layers) decoder = AttnDecoderRNN(attn_model, hidden_size , len(eng_words.vocab), n_layers, dropout=dropout_p)  # Move models to GPU if USE_CUDA:     encoderchar= encoderchar.cuda()     encoder= encoder.cuda()     decoder= decoder.cuda()

entrophy Over a year ago

can you specify on which line this error occurs. Also the full stack trace would be nice.

Satish K Cheekala Over a year ago

I've run through the code myself so many times, couldn't trace the mistake. Couldn't you spot anything from my train() function?

entrophy Over a year ago

It looks alright to me. But as a last resort, Do this: First, remove all the .cuda() calls on all Variable objects. Then only for first input that passes through any network make sure to call the .cuda() on the tensor and not the Variable that wraps it.

Satish K Cheekala Over a year ago

Thanks. Able to spot the issue and fix it. The parameters input_batch, target_batch passed to the train() function had to be dropped in GPU before the call. Also, as suggested removed all output variables from cuda(). The combination of above had fixed the issue. Deeply appreciate your help.

Collectives™ on Stack Overflow

PyTorch RuntimeError : Gradients are not CUDA tensors

Full Stack Trace is here.

1 Answer 1

Edit

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Full Stack Trace is here.

1 Answer 1

Edit

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related