5

Edit: SOLVED- Problem relied on the number of workers, lowered them, problem solved

I am using a 24GB Titan RTX and I am using it for an image segmentation Unet with Pytorch,

it is always throwing Cuda out of Memory at different batch sizes, plus I have more free memory than it states that I need, and by lowering batch sizes, it INCREASES the memory it tries to allocate which doesn't make any sense.

here is what I tried:

Image size = 448, batch size = 8

  • "RuntimeError: CUDA error: out of memory"

Image size = 448, batch size = 6

  • "RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 24.00 GiB total capacity; 2.06 GiB already allocated; 19.66 GiB free; 2.31 GiB reserved in total by PyTorch)"

is says it tried to allocate 3.12GB and I have 19GB free and it throws an error??

Image size = 224, batch size = 8

  • "RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 24.00 GiB total capacity; 2.78 GiB already allocated; 19.15 GiB free; 2.82 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 6

  • "RuntimeError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 24.00 GiB total capacity; 2.30 GiB already allocated; 19.38 GiB free; 2.59 GiB reserved in total by PyTorch)"

reduced batch size but tried to allocate more ???

Image size = 224, batch size = 4

  • "RuntimeError: CUDA out of memory. Tried to allocate 482.00 MiB (GPU 0; 24.00 GiB total capacity; 2.21 GiB already allocated; 19.48 GiB free; 2.50 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 2

  • "RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 24.00 GiB total capacity; 1.44 GiB already allocated; 19.88 GiB free; 2.10 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 1

  • "RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 24.00 GiB total capacity; 894.36 MiB already allocated; 20.94 GiB free; 1.03 GiB reserved in total by PyTorch)"

Even with stupidly low image sizes and batch sizes...

1
  • 1
    You might want to consider adding your solution as an answer. Commented Mar 16, 2021 at 13:01

1 Answer 1

3

SOLVED- Problem relied on the number of workers, lowered them, problem solved

Sign up to request clarification or add additional context in comments.

3 Comments

How do you lower the number of worker?
I am using Pytorch, to reduce the number of workers simply pass a value to the "num_workers" parameter in the data loader, like so: ``` train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, num_workers=4, shuffle=False) ``` if you use "num_workers=-1" you use all of them
Got the same problem of yours but already had the num_workers set to 1 - the minimum. I have 500 square images of 500px each, batch_size=1 and got the error: Tried to allocate 46.00 MiB (GPU 0; 32.00 GiB total capacity; 27.71 GiB already allocated; 118.54 MiB free; 28.38 GiB reserved in total by PyTorch)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.