How can I flush GPU memory using CUDA (physical reset is unavailable)

Question

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.

Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.

I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.

So, the question is - Is there any way to flush the device memory in this situation?

Although nvidia-smi --gpu-reset is not available, I can still get some information with nvidia-smi -q. In most fields it gives 'N/A', but some information is useful. Here is the relevant output: Memory Usage Total : 1535 MB Used : 1227 MB Free : 307 MB — timdim
– timdim, Commented Mar 4, 2013 at 8:35
If you have root access, you can unload and reload the nvidia driver. — tera
– tera, Commented Mar 4, 2013 at 10:14
If you do ps -ef |grep 'whoami' and the results show any processes that appear to be related to your crashed session, kill those. (the single quote ' should be replaced with backtick ` ) — Robert Crovella
– Robert Crovella, Commented Mar 4, 2013 at 16:18
nvidia-smi -caa worked great for me to release memory on all GPUs at once. — David Arenburg
– David Arenburg, Commented Jun 25, 2019 at 11:54

Kenan · Accepted Answer · 2018-02-21 22:04:18Z

220

check what is using your GPU memory with

sudo fuser -v /dev/nvidia*

Your output will look something like this:

                     USER        PID  ACCESS COMMAND
/dev/nvidia0:        root       1256  F...m  Xorg
                     username   2057  F...m  compiz
                     username   2759  F...m  chrome
                     username   2777  F...m  chrome
                     username   20450 F...m  python
                     username   20699 F...m  python

Then kill the PID that you no longer need on htop or with

sudo kill -9 PID.

In the example above, Pycharm was eating a lot of memory so I killed 20450 and 20699.

edited Feb 21, 2018 at 22:04

answered Oct 6, 2017 at 2:07

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Davidmh Over a year ago

Thank you! For some reason, I had a process hogging all my VRAM, not showing on nvidia-smi.

Little Bobby Tables Over a year ago

I need to use this a lot when running deep learning in different jupyter notebooks. The only issue is knowing exactly which PID is which. Any tips on this?

Kenan Over a year ago

@josh I kill them one at a time making a mental note of the COMMAND.

Little Bobby Tables Over a year ago

@kRazzyR - It uses it for speeding up computations, I assume, for rendering graphics, but maybe also other things. This did cause me a lot of issues when I install Nvidia drivers, CUDA and cudnn. I had to turn a lot of it off. See here.

VRehnberg Mar 21 at 8:23

@LittleBobbyTables Best way of finding PID I've found is to use nvtop the PID listed there can be killed directly, do that. But, if not look for PIDs listed by fuser close to that one. (Possibly, also use htop -u <user> on output from fuser to see if process looks stalled).

|

Ashiq Imran · Accepted Answer · 2018-12-14 08:24:45Z

82

First type

nvidia-smi

then select the PID that you want to kill

sudo kill -9 PID

answered Dec 14, 2018 at 8:24

Ashiq Imran

2,36122 silver badges18 bronze badges

4 Comments

Lior Magen Over a year ago

Brilliant, this one actually worked for me. PID should be replaced with the.. PID number of the process that uses the GPU (which you can figure by nvidia-smi)

desmond13 Over a year ago

the command nvidia-smi returns Failed to initialize NVML: Driver/library version mismatch

ejkitchen Over a year ago

nvidia-smi gives me two processes and when I go to kill them, it says no such process. processes are both called Xwayland

Academic Over a year ago

This should be the chosen answer

Lukas · Accepted Answer · 2020-03-31 09:34:31Z

23

for the ones using python:

import torch, gc
gc.collect()
torch.cuda.empty_cache()

answered Mar 31, 2020 at 9:34

Lukas

4474 silver badges6 bronze badges

4 Comments

talonmies Over a year ago

This cannot in any way to what the questioner was asking about

vmatyi Over a year ago

nevertheless answered my problem (which is admittedly not the exact same as the OP asked, but matches the title while searching)

Thiago Loureiro Over a year ago

same here, any help is appreciated :)

Olney1 Over a year ago

I too found this answer useful. Thank you.

2 revs, 2 users 91% · Accepted Answer · 2020-05-13 19:27:05Z

21

Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

$ rmmod nvidia

with suitable root privileges and then reloading it with

$ modprobe nvidia

If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag

edited May 13, 2020 at 19:27

community wiki

2 revs, 2 users 91%
talonmies

1 Comment

DSBLR Over a year ago

cannot process the above command, error says, CUDA in use. So killed the PID using the solution provided by stackoverflow.com/a/46597252/3503565. Its works for me

Antoine Viallon · Accepted Answer · 2021-07-09 13:15:07Z

16

One can also use nvtop, which gives an interface very similar to htop, but showing your GPU(s) usage instead, with a nice graph. You can also kill processes directly from here.

Here is a link to its Github : https://github.com/Syllo/nvtop

NVTOP interface

edited Jul 9, 2021 at 13:15

answered Apr 10, 2020 at 9:57

Antoine Viallon

3855 silver badges16 bronze badges

Comments

Petter Friberg · Accepted Answer · 2017-07-19 23:40:01Z

12

I also had the same problem, and I saw a good solution in quora, using

sudo kill -9 PID.

see https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

edited Jul 19, 2017 at 23:40

Petter Friberg

21.8k10 gold badges67 silver badges116 bronze badges

answered Jul 19, 2017 at 13:26

ailihong

1371 silver badge2 bronze badges

1 Comment

Little Bobby Tables Over a year ago

Worked a treat when I accidentally opened and loaded two different jupyter notebooks with VGG16. Warning: it kills the notebooks. I guess you could pick one to free up some memory for the other but I dont know how you select the PID for a given notebook.

AbuAli · Accepted Answer · 2022-12-17 11:20:34Z

12

to kill all processess on GPU:

sudo fuser -v /dev/nvidia* -k

edited Dec 17, 2022 at 11:20

answered Dec 17, 2022 at 11:19

AbuAli

1511 silver badge5 bronze badges

3 Comments

DSDmark Over a year ago

Instead of simply providing the answer directly, try writing a detailed comment that explains the solution, as long as the explanation is not too lengthy. @AbuAli .

Mustapha AJEGHRIR Feb 26 at 16:56

I had some processes that are blocked, impossible to kill them using PID (Like if they are already dead). This solved it.

Mustapha AJEGHRIR Mar 5 at 17:47

I would suggest running simply sudo fuser -v /dev/nvidia0 -k and replace 0 by the id of the gpu you want to clean. The problem with * is that it kills nvidia-nvlink and nvidia-nvswitch*.

George Pearse · Accepted Answer · 2023-07-18 21:09:33Z

8

Normally I just use nvidia-smi, but for some problems it's not enough (something still in cuda memory)

The nvidia-smi kill all is:

nvidia-smi | grep 'python' | awk '{ print $5 }' | xargs -n1 kill -9

If you're still hitting unexpected memory errors or similar problems then try:

sudo fuser -v /dev/nvidia* | cut -d' ' -f2- | sudo xargs -n1 kill -9

answered Jul 18, 2023 at 21:09

George Pearse

3764 silver badges7 bronze badges

Comments

Karl Amort · Accepted Answer · 2016-10-12 21:54:07Z

5

on macOS (/ OS X), if someone else is having trouble with the OS apparently leaking memory:

https://github.com/phvu/cuda-smi is useful for quickly checking free memory
Quitting applications seems to free the memory they use. Quit everything you don't need, or quit applications one-by-one to see how much memory they used.
If that doesn't cut it (quitting about 10 applications freed about 500MB / 15% for me), the biggest consumer by far is WindowServer. You can Force quit it, which will also kill all applications you have running and log you out. But it's a bit faster than a restart and got me back to 90% free memory on the cuda device.

answered Oct 12, 2016 at 21:54

Karl Amort

16.4k8 gold badges69 silver badges77 bronze badges

Comments

zahid hasan · Accepted Answer · 2021-12-01 05:01:30Z

1

For OS: UBUNTU 20.04 In the terminal type

nvtop

If the direct killing of consuming activity doesn't work then find and note the exact number of activity PID with most GPU usage.

sudo kill PID -number

answered Dec 1, 2021 at 5:01

zahid hasan

351 bronze badge

Comments

Olney1 · Accepted Answer · 2024-09-11 15:54:42Z

1

Expanding on the Python solution above, you can get further detail on the memory being cleared and print the outcome:

import torch
import gc

def print_gpu_memory():
    allocated = torch.cuda.memory_allocated() / (1024**2)
    cached = torch.cuda.memory_reserved() / (1024**2)
    print(f"Allocated: {allocated:.2f} MB")
    print(f"Cached: {cached:.2f} MB")

# Before clearing the cache
print("Before clearing cache:")
print_gpu_memory()

# Clearing cache
gc.collect()
torch.cuda.empty_cache()

# After clearing the cache
print("\nAfter clearing cache:")
print_gpu_memory()

answered Sep 11, 2024 at 15:54

Olney1

7921 gold badge8 silver badges19 bronze badges

Comments

user20068036 · Accepted Answer · 2022-09-28 09:32:12Z

0

If you have the problem that after killing one process the next starts (Comment)- like for example when you have a bash script that calls multiple python scripts and you want to kill them but can't find its PID you can use ps -ef where you'll find the PID of your "problematic" process and also its PPID (parent PID). Use kill PPID or kill -9 PPID or sudo kill PPID to stop the processes.

answered Sep 28, 2022 at 9:32

user20068036

718 bronze badges

Comments

Fedor · Accepted Answer · 2023-03-02 16:29:49Z

0

If all of this does not work, I found another answer here:

How to kill process on GPUs with PID in nvidia-smi using keyword?

nvidia-smi | grep 'python' | awk '{ print $X }' | xargs -n1 kill -9

Note that X (in the 'awk' expression) corresponds to the Xth column of your nvidia-smi command. If your nvidia-smi command looks like this, you should then replace X by 5.

edited Mar 2, 2023 at 16:29

Fedor

24.7k45 gold badges59 silver badges188 bronze badges

answered Feb 24, 2023 at 11:53

arnaudlenain

363 bronze badges

Comments

VRehnberg · Accepted Answer · 2025-03-21 09:07:12Z

0

If you're unsure what process to kill¹:

Use nvtop to find PIDs of the dead processes still hogging VRAM and which device index
Check fuser -v /dev/nvidia<device index> to find user (change <device index> to relevant integer)²
Use htop -u <user> and kill processes that seem to have hanged and are close to the PID reported by nvtop

¹nvidia-smi filters processes if it gets an error in nvmlSystemGetProcessName [source]

²This is a complement to Kenan's answer.

edited Mar 21 at 9:07

answered Mar 21 at 8:28

VRehnberg

6429 silver badges19 bronze badges

Comments

Roman Dvoryankov · Accepted Answer · 2025-08-26 00:36:45Z

0

I break model training and gpu remains with full memory and this helps me. But by carefull its also kills python env kernel

pkill -f python

answered Aug 26 at 0:36

Roman Dvoryankov

11 bronze badge

2 Comments

Community Aug 26 at 0:38

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Aadvik Aug 26 at 1:08

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

kor · Accepted Answer · 2022-09-18 17:30:04Z

-2

I just started a new terminal and closed the old one and it worked out pretty well for me.

answered Sep 18, 2022 at 17:30

kor

11 bronze badge

Collectives™ on Stack Overflow

How can I flush GPU memory using CUDA (physical reset is unavailable)

16 Answers 16

10 Comments

4 Comments

4 Comments

1 Comment

Comments

1 Comment

3 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

10 Comments

4 Comments

4 Comments

1 Comment

Comments

1 Comment

3 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related