Strange Cuda out of Memory behavior in Pytorch

Question

Edit: SOLVED- Problem relied on the number of workers, lowered them, problem solved

I am using a 24GB Titan RTX and I am using it for an image segmentation Unet with Pytorch,

it is always throwing Cuda out of Memory at different batch sizes, plus I have more free memory than it states that I need, and by lowering batch sizes, it INCREASES the memory it tries to allocate which doesn't make any sense.

here is what I tried:

Image size = 448, batch size = 8

"RuntimeError: CUDA error: out of memory"

Image size = 448, batch size = 6

"RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 24.00 GiB total capacity; 2.06 GiB already allocated; 19.66 GiB free; 2.31 GiB reserved in total by PyTorch)"

is says it tried to allocate 3.12GB and I have 19GB free and it throws an error??

Image size = 224, batch size = 8

"RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 24.00 GiB total capacity; 2.78 GiB already allocated; 19.15 GiB free; 2.82 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 6

"RuntimeError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 24.00 GiB total capacity; 2.30 GiB already allocated; 19.38 GiB free; 2.59 GiB reserved in total by PyTorch)"

reduced batch size but tried to allocate more ???

Image size = 224, batch size = 4

"RuntimeError: CUDA out of memory. Tried to allocate 482.00 MiB (GPU 0; 24.00 GiB total capacity; 2.21 GiB already allocated; 19.48 GiB free; 2.50 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 2

"RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 24.00 GiB total capacity; 1.44 GiB already allocated; 19.88 GiB free; 2.10 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 1

"RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 24.00 GiB total capacity; 894.36 MiB already allocated; 20.94 GiB free; 1.03 GiB reserved in total by PyTorch)"

Even with stupidly low image sizes and batch sizes...

You might want to consider adding your solution as an answer. — End genocide - save Gaza
– End genocide - save Gaza, Commented Mar 16, 2021 at 13:01

Marco Ramos · Accepted Answer · 2021-03-16 14:54:46Z

3

SOLVED- Problem relied on the number of workers, lowered them, problem solved

answered Mar 16, 2021 at 14:54

Marco Ramos

1271 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jeremy Demers Over a year ago

How do you lower the number of worker?

Marco Ramos Over a year ago

I am using Pytorch, to reduce the number of workers simply pass a value to the "num_workers" parameter in the data loader, like so: ``` train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, num_workers=4, shuffle=False) ``` if you use "num_workers=-1" you use all of them

Betelgeuse Over a year ago

Got the same problem of yours but already had the num_workers set to 1 - the minimum. I have 500 square images of 500px each, batch_size=1 and got the error: Tried to allocate 46.00 MiB (GPU 0; 32.00 GiB total capacity; 27.71 GiB already allocated; 118.54 MiB free; 28.38 GiB reserved in total by PyTorch)

Collectives™ on Stack Overflow

Strange Cuda out of Memory behavior in Pytorch

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related