How can I fix this strange error: "RuntimeError: CUDA error: out of memory"?

Question

I successfully trained the network but got this error during validation:

RuntimeError: CUDA error: out of memory

How do you eventually fix the bug then? Do you reduce the batch size? — guanh01
– guanh01, Commented Oct 11, 2020 at 18:58
@Lauraishere, they commented below that they reduced the batch size and it did not work. Same for me also. Did you solve your problem, and if yes, could you please share? — user5161995
– user5161995, Commented Feb 9, 2021 at 14:56
If the model is used for validation, you can try using 'torch.no_grad()'. — Abhibha Gupta
– Abhibha Gupta, Commented Jun 5, 2021 at 15:05
Also, Pytorch FAQ provides good insight on why this problem occurs and provides some solutions for this problem. — Amir Pourmand
– Amir Pourmand, Commented Jun 9, 2022 at 7:58

Mateen Ulhaq · Accepted Answer · 2022-03-29 06:38:07Z

52

The error occurs because you ran out of memory on your GPU.

One way to solve it is to reduce the batch size until your code runs without this error.

edited Mar 29, 2022 at 6:38

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

answered Jan 26, 2019 at 7:11

K. Khanda

6305 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

xiaoding chen Over a year ago

I tried it, I reduce the batch size to 8,but it also has the same error.

xiaoding chen Over a year ago

The amount of data in the training set is much larger than the verification set. Why is there no error in training, and there is time for validation?

K. Khanda Over a year ago

Another approach which helped me was this: I ran this command in terminal sudo rm -rf ~/.nv and after rebooted my laptop.

K. Khanda Over a year ago

Also maybe tensors, which were used during the training are still active and then you are creating even more during the validation.

K. Khanda Over a year ago

You can check this issue here github.com/tensorflow/tensorflow/issues/19731

|

Milad shiri · Accepted Answer · 2020-06-15 06:47:50Z

48

The best way is to find the process engaging gpu memory and kill it:

find the PID of python process from:

nvidia-smi

copy the PID and kill it by:

sudo kill -9 pid

answered Jun 15, 2020 at 6:47

Milad shiri

1,0209 silver badges7 bronze badges

3 Comments

IntegrateThis Over a year ago

what other programs could be taking up a lot of GPU memory other than something obvious like a game?

krc Over a year ago

For others: If you stop a program mid-execution using Jupyter it can continue to hog GPU memory. This answer makes it clear that the only way to get around this issue in this case is to restart the kernel.

questionto42 Over a year ago

Worked for me even without sudo in a Jupyter Notebook with !kill ... or %kill ..., Yet, the kernel needs to restart. Check the gc.collect() answer instead if you want to avoid restarting the kernel.

YoungMin Park · Accepted Answer · 2019-01-29 09:32:33Z

44

1.. When you only perform validation not training,
you don't need to calculate gradients for forward and backward phase.
In that situation, your code can be located under

with torch.no_grad():
    ...
    net=Net()
    pred_for_validation=net(input)
    ...

Above code doesn't use GPU memory

2.. If you use += operator in your code,
it can accumulate gradient continuously in your gradient graph.
In that case, you need to use float() like following site
https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory

Even if docs guides with float(), in case of me, item() also worked like

entire_loss=0.0
for i in range(100):
    one_loss=loss_function(prediction,label)
    entire_loss+=one_loss.item()

3.. If you use for loop in training code,
data can be sustained until entire for loop ends.
So, in that case, you can explicitly delete variables after performing optimizer.step()

for one_epoch in range(100):
    ...
    optimizer.step()
    del intermediate_variable1,intermediate_variable2,...

answered Jan 29, 2019 at 9:32

YoungMin Park

1,2192 gold badges12 silver badges18 bronze badges

2 Comments

Lei Hao Over a year ago

Regarding point 1, I use the pretrained bert model to transform the text data (only inference, no training). Still get cuda out of memory error.

stackoverflowuser2010 Over a year ago

@LeiHao: Try reducing your batch size.

Syscall · Accepted Answer · 2021-04-02 17:51:21Z

44

I had the same issue and this code worked for me :

import gc

gc.collect()

torch.cuda.empty_cache()

edited Apr 2, 2021 at 17:51

Syscall

19.8k10 gold badges44 silver badges60 bronze badges

answered Apr 2, 2021 at 15:16

behnaz.sheikhi

7548 silver badges6 bronze badges

4 Comments

jtomasrl Over a year ago

good if running on collab and need to reset GPU memory

Inspi Over a year ago

interesting, where would you put this code in your program ? at the beginning ?

questionto42 Over a year ago

Works. Dropped the memory of all GPU:s to less than a GB, no kernel restart needed.

questionto42 Over a year ago

See also pytorch delete model from gpu if you want to delete a chosen model or all models, check both answers.

Alessandro Suglia · Accepted Answer · 2019-01-26 16:28:58Z

11

It might be for a number of reasons that I try to report in the following list:

Modules parameters: check the number of dimensions for your modules. Linear layers that transform a big input tensor (e.g., size 1000) in another big output tensor (e.g., size 1000) will require a matrix whose size is (1000, 1000).
RNN decoder maximum steps: if you're using an RNN decoder in your architecture, avoid looping for a big number of steps. Usually, you fix a given number of decoding steps that is reasonable for your dataset.
Tensors usage: minimise the number of tensors that you create. The garbage collector won't release them until they go out of scope.
Batch size: incrementally increase your batch size until you go out of memory. It's a common trick that even famous library implement (see the biggest_batch_first description for the BucketIterator in AllenNLP.

In addition, I would recommend you to have a look to the official PyTorch documentation: https://pytorch.org/docs/stable/notes/faq.html

answered Jan 26, 2019 at 16:28

Alessandro Suglia

1,9371 gold badge17 silver badges23 bronze badges

1 Comment

xiaoding chen Over a year ago

The same network is used for training and validation. Why is there no error in training, and it happens when validation?

Toru Kikuchi · Accepted Answer · 2021-05-21 09:28:00Z

10

I am a Pytorch user. In my case, the cause for this error message was actually not due to GPU memory, but due to the version mismatch between Pytorch and CUDA.

Check whether the cause is really due to your GPU memory, by a code below.

import torch
foo = torch.tensor([1,2,3])
foo = foo.to('cuda')

If an error still occurs for the above code, it will be better to re-install your Pytorch according to your CUDA version. (In my case, this solved the problem.) Pytorch install link

A similar case will happen also for Tensorflow/Keras.

edited May 21, 2021 at 9:28

answered May 20, 2021 at 8:59

Toru Kikuchi

3824 silver badges6 bronze badges

2 Comments

Blade Over a year ago

what does 're-install your Pytorch according to your CUDA version' mean? How do you correspond versions of cuda and pytorch? let's say I'm installing the nightly version, what cuda version is appropriate in your definition?

damagedgods Over a year ago

@Blade, the answer to your question won't be static. But this page suggests that the current nightly build is built against CUDA 10.2 (but one can install a CUDA 11.3 version etc.). Moreover, the previous versions page also has instructions on installing for specific versions of CUDA.

Themba Tman · Accepted Answer · 2021-06-09 14:55:01Z

6

If you are getting this error in Google Colab use this code:

import torch
torch.cuda.empty_cache()

answered Jun 9, 2021 at 14:55

Themba Tman

711 silver badge2 bronze badges

4 Comments

Lakshmi Narayanan Over a year ago

can we use this code in our local machines too? I keep getting this error as well, in a much detailed fashion : RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.95 GiB total capacity; 2.80 GiB already allocated; 39.31 MiB free; 2.89 GiB reserved in total by PyTorch) @ThembaTman

Themba Tman Over a year ago

Yeah, you can.empty_cache() doesn’t increase the amount of GPU memory available for PyTorch.However, in some instances, it can help reduce GPU memory fragmentation.

sri_s Over a year ago

On Google Colab with a T4 instance i was getting the CUDA out of memory error while running a Llama-2-7b finetuning project. After trying out all kinds of suggestions including reducing batch_size to 1 and the above suggestion, the only thing that worked for me was upgrading to a Colab Pro subscription and using a A100 GPU with high memory . The particular model i was running ended up using a peak of 22.6 GB (with batch size of 1) on the A100 GPU VRAM.

PYK Feb 18 at 2:12

In COLAB remember to read the Model in 4bit load_in_4bit=True

Amanda · Accepted Answer · 2023-06-22 17:34:14Z

2

I had this same error RuntimeError: CUDA error: out of memory

I was able to resolve this on a machine with 4 GPUs by first running nvidia-smi to learn that GPU 1 is already at full capacity by another user, causing the error as my script also tried to use the first GPU. I then ran export CUDA_VISIBLE_DEVICES=2,3,4 on the cli. My script now runs by looking only for GPUs 2,3,4 and ignoring 1.

In my case, my code actually doesn't need a GPU but was trying to use them, so I set export CUDA_VISIBLE_DEVICES="" and now it runs on CPU without attempting to use GPU.

answered Jun 22, 2023 at 17:34

Amanda

2221 gold badge3 silver badges8 bronze badges

Comments

WaXxX333 · Accepted Answer · 2022-09-17 06:34:08Z

1

Not sure if this'll help you or not, but this is what solved the issue for me:

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

Nothing else in this thread helped.

answered Sep 17, 2022 at 6:34

WaXxX333

5161 gold badge4 silver badges11 bronze badges

1 Comment

siddharth gupta Over a year ago

excellent its working after putting these, will I need to run this command everytime , what exactly it does.

William Yolland · Accepted Answer · 2022-11-21 19:02:00Z

In my experience, this is not a typical CUDA OOM Error caused by PyTorch trying to allocate more memory on the GPU than you currently have.

The giveaway is the distinct lack of the following text in the error message.

Tried to allocate xxx GiB (GPU Y; XXX GiB total capacity; yyy MiB already allocated; zzz GiB free; aaa MiB reserved in total by PyTorch)

In my experience, this is an Nvidia driver issue. A reboot has always solved the issue for me, but there are times when a reboot is not possible.

One alternative to rebooting is to kill all Nvidia processes and reload the drivers manually. I always refer to the unaccepted answer of this question written by Comzyh when performing the driver cycle. Hope this helps anyone trapped in this situation.

James_SO · Accepted Answer · 2023-05-31 15:47:49Z

If you're running Keras/TF in Jupyter on a local server and another notebook is open which was accessing the GPU, you can also get this error. Just halt and close the other notebook(s). This can occur even if the other notebook isn't actively running anything.

This is distinct from PyTorch OOM errors, which typically refer to PyTorch's allocation of GPU RAM and are of the form

OutOfMemoryError: CUDA out of memory. Tried to allocate 734.00 MiB (GPU 0; 7.79 GiB total capacity; 5.20 GiB already allocated; 139.94 MiB free; 6.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Because PyTorch manages a subset of GPU RAM for a given job, it can sometimes draw an OOM error even though there's sufficient available RAM in the GPU (just not enough in Torch's self-allocation)

These errors can be a bit obscure to troubleshoot, but generally three techniques can be helpful:

at the head of your notebook, add these lines: import os os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:64"
delete objects that are on the GPU as soon as you don't need them anymore
reduce things like batch_size in training or testing scenarios

You can monitor GPU RAM simplistically with watch nvidia-smi

Every 2.0s: nvidia-smi                                                                     numbaCruncha123: Wed May 31 11:30:57 2023

Wed May 31 11:30:57 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:26:00.0 Off |                  N/A |
| 37%   33C    P2    34W / 175W |   7915MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2905      C   ...user/z_Venv/NC/bin/python     1641MiB |
|    0   N/A  N/A     31511      C   ...user/z_Venv/NC/bin/python     6271MiB |
+-----------------------------------------------------------------------------+

This will tell you what's using RAM across the entire GPU.

Note: if you've got a notebook running but don't see anything here, it's possible you're running on the CPU.

dgellow · Accepted Answer · 2020-08-22 19:57:40Z

0

If someone arrives here because of fast.ai, the batch size of a loader such as ImageDataLoaders can be controlled via bs=N where N is the size of the batch.

My dedicated GPU is limited to 2GB of memory, using bs=8 in the following example worked in my situation:

from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(244), num_workers=0, bs=)

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

answered Aug 22, 2020 at 19:57

dgellow

8921 gold badge15 silver badges23 bronze badges

1 Comment

John Deighan Over a year ago

This is exactly where I was encountering this error - trying to execute the above jupyter cell for the book "Deep Learning for Coders with fastai and pytorch". However, at first, it didn't work. Even with num_workers=0 and bs=8, it ran out of memory. I tried using bs=4, I tried shutting down all other running apps, still out of memory. But then, I decided to reboot (always a good idea with Windows), and after that, it took a while, but ran successfully. In fact, thinking about it, I'd probably recommend rebooting first, then using just num_workers=0 (which is necessary under Windows).

ah bon · Accepted Answer · 2021-05-22 13:54:55Z

0

Problem solved by the following code:

import os
os.environ['CUDA_VISIBLE_DEVICES']='2, 3'

answered May 22, 2021 at 13:54

ah bon

10.1k22 gold badges82 silver badges185 bronze badges

1 Comment

Adam Burke Over a year ago

I guess this would only work when you had multiple GPUs?

Aadesh Kulkarni · Accepted Answer · 2023-06-04 05:44:52Z

0

Find out what other processes are also using the GPU and free up that space.

find the PID of python process by running:

nvidia-smi

and kill it using

sudo kill -9 pid

answered Jun 4, 2023 at 5:44

Aadesh Kulkarni

5775 silver badges17 bronze badges

Comments

Prajot Kuvalekar · Accepted Answer · 2023-11-29 07:19:55Z

You can try something like this before your training loop

model = UNET( n_channels, n_classes)

for i in range(epochs):
    torch.cuda.empty_cache()
    model.use_checkpointing() # Or directly you can do 'model = torch.utils.checkpoint(model)'
    your_train_function()

where your model definition should be like below block

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes, bilinear=False):
        super(UNet, self).__init__()
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.bilinear = bilinear

        self.inc = (DoubleConv(n_channels, 64))
        self.down1 = (Down(64, 128))
        self.down2 = (Down(128, 256))
        self.down3 = (Down(256, 512))
        factor = 2 if bilinear else 1
        self.down4 = (Down(512, 1024 // factor))
        self.up1 = (Up(1024, 512 // factor, bilinear))
        self.up2 = (Up(512, 256 // factor, bilinear))
        self.up3 = (Up(256, 128 // factor, bilinear))
        self.up4 = (Up(128, 64, bilinear))
        self.outc = (OutConv(64, n_classes))

    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up1(x5, x4)
        x = self.up2(x, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)
        logits = self.outc(x)
        return logits

    def use_checkpointing(self):
        self.inc = torch.utils.checkpoint(self.inc)
        self.down1 = torch.utils.checkpoint(self.down1)
        self.down2 = torch.utils.checkpoint(self.down2)
        self.down3 = torch.utils.checkpoint(self.down3)
        self.down4 = torch.utils.checkpoint(self.down4)
        self.up1 = torch.utils.checkpoint(self.up1)
        self.up2 = torch.utils.checkpoint(self.up2)
        self.up3 = torch.utils.checkpoint(self.up3)
        self.up4 = torch.utils.checkpoint(self.up4)
        self.outc = torch.utils.checkpoint(self.outc)

Comments

Mohamed Mahmoud · Accepted Answer · 2024-04-03 06:27:56Z

0

Do not call model.zero_grad during inference or validation as this will allocate a huge memory.

answered Apr 3, 2024 at 6:27

Mohamed Mahmoud

111 bronze badge

Comments

Sazzad Hissain Khan · Accepted Answer · 2024-06-10 10:00:52Z

0

Please check if others program are holding your memory. In the case of Jupyter Notebook, you can go to the running tab and shutdown all other programs and try to run yours one again.

answered Jun 10, 2024 at 10:00

Sazzad Hissain Khan

40.7k41 gold badges213 silver badges303 bronze badges

Comments

mikey · Accepted Answer · 2025-10-09 13:56:13Z

0

I've found it helpful in these situations to kill all python processes that are running before training a model that uses the gpu. It will give you a clean slate to start with.

WARNING: This will kill any open notebooks or any other python processes.

You can kill all python processes by typing pkill -9 python in the terminal. I've found this to work even when torch.cuda.empty_cache() does not actually release the gpu memory.

answered Oct 9 at 13:56

mikey

1,2601 gold badge13 silver badges17 bronze badges

Comments

Gino Mempin · Accepted Answer · 2023-01-25 09:50:03Z

-3

I faced the same issue with my computer. All you have to do is customize your configuration file to match your computer's specifications. Turns out my computer takes image sizes below 600 X 600 and when I adjusted the same in the configuration file, the program ran smoothly.

edited Jan 25, 2023 at 9:50

Gino Mempin

30.5k31 gold badges125 silver badges174 bronze badges

answered Dec 14, 2020 at 5:30

nimish nahar

1

Collectives™ on Stack Overflow

How can I fix this strange error: "RuntimeError: CUDA error: out of memory"?

20 Answers 20

8 Comments

3 Comments

2 Comments

4 Comments

1 Comment

2 Comments

4 Comments

Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

20 Answers 20

8 Comments

3 Comments

2 Comments

4 Comments

1 Comment

2 Comments

4 Comments

Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related