5

I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2 pretrained model with the following code:

def pre_encode_wikipedia(model, tokenizer, device, save_path):
  
  document_data_list = []

  for iteration, document in enumerate(wikipedia_small['text']):
    torch.cuda.empty_cache()

    sentence_embeds_per_doc = [torch.randn(128)]
    attention_mask_per_doc = [1]
    special_tokens_per_doc = [1]

    doc_split = nltk.sent_tokenize(document)
    doc_tokenized = tokenizer.batch_encode_plus(doc_split, padding='longest', truncation=True, max_length=512, return_tensors='pt')

    for key, value in doc_tokenized.items():
      doc_tokenized[key] = doc_tokenized[key].to(device)

    with torch.no_grad():  
      doc_encoded = model(**doc_tokenized)

    for sentence in doc_encoded['last_hidden_state']:
      sentence[0].to('cpu')
      sentence_embeds_per_doc.append(sentence[0])
      attention_mask_per_doc.append(1)
      special_tokens_per_doc.append(0)

    sentence_embeds = torch.stack(sentence_embeds_per_doc)
    attention_mask = torch.FloatTensor(attention_mask_per_doc)
    special_tokens_mask = torch.FloatTensor(special_tokens_per_doc)

    document_data = torch.utils.data.TensorDataset(*[sentence_embeds, attention_mask, special_tokens_mask])
    torch.save(document_data, f'{save_path}{time.strftime("%Y%m%d-%H%M%S")}{iteration}.pt')
    print(f"Document at {iteration} encoded and saved.")

After about 200-300 iterations on my local GTX 1060 3GB I get an error saying that my CUDA memory is full. Running this code on Colab with more GPU RAM gives me a few thousand iterations.

Things I've tried:

  • Adding torch.cuda.empty_cache() to the start of every iteration to clear out previously held tensors
  • Wrapping the model in torch.no_grad() to disable the computation graph
  • Setting model.eval() to disable any stochastic properties that might take up memory
  • Sending the output straight to CPU in hopes to free up memory

I'm baffled as to why my memory keeps overflowing. I've trained several models of bigger sizes, applying all the standard practices of a training loop (optimizer.zero_grad(), etc.) I've never had this problem. Why does it appear during this seemingly trivial task?

Edit #1 Changing sentence[0].to('cpu') to cpu_sentence = sentence[0].to('cpu') gave me a few thousand iterations before VRAM usage suddenly spiked, causing the run to crash:enter image description here

4
  • I don't think sentence[0].to('cpu') will move your tensor to 'cpu', it will make a copy. Could you check? Commented Jan 26, 2021 at 18:58
  • How do I check whether it's making a copy or not? Commented Jan 26, 2021 at 19:43
  • Do you get this error also on CUDA after the few 1000 iterations? Commented Jan 27, 2021 at 13:17
  • Yes same error, I'm assuming it's just because the Colab GPUs have larger VRAM and it takes more iterations to fill up Commented Jan 27, 2021 at 15:37

1 Answer 1

1

Can you try replacing

sentence[0].to('cpu')

with

cpu_sentence = sentence[0].to('cpu')

See more info here https://pytorch.org/docs/stable/tensors.html#torch.Tensor.to

Sign up to request clarification or add additional context in comments.

6 Comments

This seemed to work at first VRAM was reasonable low utilization for a few thousand iterations now. About an order of magnitude more than what I would usually get so something definitely worked but then RuntimeError: CUDA out of memory. Tried to allocate 112.00 MiB (GPU 0; 3.00 GiB total capacity; 1.95 GiB already allocated; 0 bytes free; 1.98 GiB reserved in total by PyTorch) reappeared. I'm posting a picture of the VRAM spike in the description.
Did you change it like this: cpu_sentence = sentence[0].to('cpu') sentence_embeds_per_doc.append(cpu_sentence)
I did, yes. Got any other idea what I could try?
I think you should look into what allocates this much memory: 112.00 MiB
When you import the pretrained model you can do the following: model = ???.from_pretrained("google/bert_uncased_L-2_H-128_A-2") model.to("cpu)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.