0

I'm trying to run training on an LLM for text generation. Even after various changes to my code, I am still getting this error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 4.54 GiB is free. Of the allocated memory 480.02 MiB is allocated by PyTorch, and 1.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Which makes no sense since as it states, my GPU has more than enough space left. This is my first time deploying models, so any help is appreciated!

6
  • How much RAM does your system have? github.com/pytorch/pytorch/issues/40002 Commented Dec 21, 2023 at 23:07
  • I've got 64GB of RAM Commented Dec 21, 2023 at 23:14
  • One thing you can try is decreasing the batch size. Can you try that' Commented Dec 21, 2023 at 23:19
  • Already tried doing that. Batch size is currently set to 1, but I've played around with the number. Commented Dec 21, 2023 at 23:41
  • Fragmentation matters; it's not just how much memory but how it's divided. Typically, nvidia's proprietary allocator does better than the default one IME -- the error message tells you where to look to find the relevant configuration docs Commented Dec 22, 2023 at 0:46

1 Answer 1

1

I figured out the issue. It was my dataset which was formatted in a way that was messing up the tokenizer and creating the issue. After re-creating the dataset, the issue went away.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.