torch.cuda.OutOfMemoryError: CUDA out of memory

Question

I'm trying to run training on an LLM for text generation. Even after various changes to my code, I am still getting this error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 4.54 GiB is free. Of the allocated memory 480.02 MiB is allocated by PyTorch, and 1.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Which makes no sense since as it states, my GPU has more than enough space left. This is my first time deploying models, so any help is appreciated!

How much RAM does your system have? github.com/pytorch/pytorch/issues/40002 — Nathan Jiang
– Nathan Jiang, Commented Dec 21, 2023 at 23:07
One thing you can try is decreasing the batch size. Can you try that' — Nathan Jiang
– Nathan Jiang, Commented Dec 21, 2023 at 23:19
Already tried doing that. Batch size is currently set to 1, but I've played around with the number. — Dre174
– Dre174, Commented Dec 21, 2023 at 23:41
Fragmentation matters; it's not just how much memory but how it's divided. Typically, nvidia's proprietary allocator does better than the default one IME -- the error message tells you where to look to find the relevant configuration docs — Charles Duffy
– Charles Duffy, Commented Dec 22, 2023 at 0:46

Dre174 · Accepted Answer · 2023-12-28 21:22:13Z

1

I figured out the issue. It was my dataset which was formatted in a way that was messing up the tokenizer and creating the issue. After re-creating the dataset, the issue went away.

answered Dec 28, 2023 at 21:22

Dre174

112 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

torch.cuda.OutOfMemoryError: CUDA out of memory

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related