RuntimeError: CUDA out of memory. Problem when re-loading the model in a loop

Question

I am running into the classic: CUDA out of memory.

What I want to do: I want to load the same model using a different matrix of embeddings every time. I have to do that 300 times, one for each dimension of the word embeddings.

I am not training the model, that is why I am using model.eval(), I thought that would be enough to keep Pytorch from creating a graph.

Please notice that I never pass the model, nor the data, to cuda. In fact, I wanted to debug the code using cpu before sending the code to be performed by a GPU.

The loop below is executed once, a RuntimeError is raised in the second iteration.

My guess is that the code is loading a new model into a GPU memory at each iteration (which I did not know was possible without explicitly pointing it to do so). The emb_matrix is quite heavy and could cause the GPU memory to crash.

emb_dim = 300
acc_dim = torch.zeros((emb_dim, 4))
for d in range(emb_dim):

    #create embeddings with one dimension shuffled
    emb_matrix = text_f.vocab.vectors.clone()

    #get a random permutation across one of the dimensions
    rand_index = torch.randperm(text_f.vocab.vectors.shape[0])
    emb_matrix[:, d] =  text_f.vocab.vectors[rand_index, d]

    #load model with the scrumbled embeddings
    model = load_classifier(emb_matrix, 
                            encoder_type = encoder_type)
    model.eval()
    for batch in batch_iters["test"]:
        x_pre = batch.premise
        x_hyp = batch.hypothesis
        y = batch.label

        #perform forward pass
        y_pred = model.forward(x_pre, x_hyp)        

        #calculate accuracies
        acc_dim[d] += accuracy(y_pred, y)/test_batches

        #avoid memory issues
        y_pred.detach()

    print(f"Dimension {d} accuracies: {acc_dim[d]}")

I get the following error: RuntimeError: CUDA out of memory. Tried to allocate 146.88 MiB (GPU 0; 2.00 GiB total capacity; 374.63 MiB already allocated; 0 bytes free; 1015.00 KiB cached)

I tried passing the model and the data to CPU, but I get precisely the same error.

I looked around for how to fix the problem, but I could not find an obvious solution. Any suggestions on how to load the model and data in the correct place, or how to clean the GPU's memory after each iteration are welcome.

To get it to run completely on the CPU for debugging, before running your program run the command export CUDA_VISIBLE_DEVICES=-1 This ensures that you wont be able to use the GPU and thus won't run out of GPU mem. — ekmcd
– ekmcd, Commented Apr 22, 2019 at 20:31
@Chris toch.cuda.empty_cache() does not free GPU memory right? It just releases it to the OS, but it will not unload the model from the GPU. pytorch.org/docs/stable/cuda.html#torch.cuda.empty_cache — Victor Zuanazzi
– Victor Zuanazzi, Commented Apr 22, 2019 at 20:35

Sergii Dymchenko · Accepted Answer · 2019-04-22 22:55:34Z

1

It looks like acc_dim accumulates the grad history - see https://pytorch.org/docs/stable/notes/faq.html

Because you're only do inference, with torch.no_grad(): should be used. This will completely sidestep the possible issue with accumulating grad history.

model.eval() doesn't prevent grad bookkeeping from happening, it just switches behavior of some layers like dropout. Both model.eval() and with torch.no_grad(): together should be used for inference.

answered Apr 22, 2019 at 22:55

Sergii Dymchenko

7,2671 gold badge25 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

RuntimeError: CUDA out of memory. Problem when re-loading the model in a loop

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related