I'm going to give a context-free set of code. The code works before adding "to(device)".
def get_input_layer(word_idx) :
x = torch.zeros(vocabulary_size).float().to(device)
x[word_idx] = 1.0
return x
embedding_dims = 5
device = torch.device("cuda:0")
W1 = Variable(torch.randn(embedding_dims, vocabulary_size).float(), requires_grad=True).to(device)
W2 = Variable(torch.randn(vocabulary_size, embedding_dims).float(), requires_grad=True).to(device)
num_epochs = 100
learning_rate = 0.001
x = Variable(get_input_layer(data)).float().to(device)
y_true = Variable(torch.from_numpy(np.array([target])).long()).to(device)
z1 = torch.matmul(W1, x).to(device)
z2 = torch.matmul(W2, z1).to(device)
log_softmax = F.log_softmax(z2, dim=0).to(device)
loss = F.nll_loss(log_softmax.view(1,-1), y_true).to(device)
loss_val += loss.data
loss.backward().to(device)
## Optimize values. This is done by hand rather than using the optimizer function
W1.data -= learning_rate * W1.grad.data
W2.data -= learning_rate * W2.grad.data
I get
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'data'
Which triggers specifically on the line
W1.data -= learning_rate * W1.grad.data
Checking, this is confirmed because W1.grad is None for some reason.
And this loops after clearing the gradients. This works just fine if I remove all of the .to(device). What is it that I'm doing wrong in trying to run this on my GPU?
Thank you for your time.