1

This question can be viewed related to my other question.

I tried running multiple machine learning processes in parallel (with bash). These are written using PyTorch. After a certain number of concurrent programs (10 in my case), I get the following error:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

As mentioned in this answer,

...it could occur because the VRAM memory limit was hit (which is rather non-intuitive from the error message).

For my case with PyTorch model training, decreasing batch size helped. You could try this or maybe decrease your model size to consume less VRAM.

I tried the solution mentioned here, to enforce a per-process GPU memory usage limit, but this issue persists.

This problem does not occur with a single process, or a fewer number of processes. Since only one context runs at a single time instant, why does this cause memory issue?

This issue occurs with/without MPS. I thought it could occur with MPS, but not otherwise, as MPS may run multiple processes in parallel.

6
  • Yeah, if you ask for too much memory, a computer may crash. This is not GPU specific, you can also try to allocate a 10000000GB array in your CPU and make your code crash. What is your question? Commented Nov 30, 2022 at 17:48
  • @AnderBiguri As stated, the problem doesn't occur with a single process of the same nature, but with 10 processes running concurrently. Why does this occur, since the GPU runs only 1 process at a time? Commented Nov 30, 2022 at 17:49
  • The GPU is a device purposely designed and built for parallel processing. Why do you think it only does 1 thing at the same time? It will compute one thing at a time, only when that computation is bigger than its processing power, but thats it. Many processes can run on the GPU simultaneously, this is absolutely OK and expected (e.g. you may be running your display and compute, at any time). Check nvidia-smi to see all your different processes running at the same time in the GPU. Commented Nov 30, 2022 at 17:50
  • @AnderBiguri By simultaneously, do you mean parallelly? I understand why display and compute appear to be happening parallelly, but they are happening sequentially. Commented Nov 30, 2022 at 17:55
  • When the GPU is executing multiple processes (one after the other, for example by pre-emption), is the memory being utilized by multiple processes at the (exact) same time? Even by those that the GPU is not executing at the moment? Commented Nov 30, 2022 at 17:57

1 Answer 1

5

Since only one context runs at a single time instant, why does this cause memory issue?

Context-switching doesn't dump the contents of GPU "device" memory (i.e. DRAM) to some other location. If you run out of this device memory, context switching doesn't alleviate that.

If you run multiple processes, the memory used by each process will add up (just like it does in the CPU space) and GPU context switching (or MPS or time-slicing) does not alleviate that in any way.

It's completely expected that if you run enough processes using the GPU, eventually you will run out of resources. Neither GPU context switching nor MPS nor time-slicing in any way affects the memory utilization per process.

Sign up to request clarification or add additional context in comments.

6 Comments

As usual, Robert has been able to convey with better words what I meant in the comments ;). Thanks.
Thank you. That answers the issue. Are you aware of any solutions to limit this usage (PyTorch or TF specific)? The ones I mentioned in the question don't appear to work.
@abs Use less memory? Buy a bigger GPU? make sure you read the available GPU specs, and schedule accordingly?
@AnderBiguri Of course those are possible. I specifically asked solutions to limit the usage.
There are many many many questions that are PyT or TF specific, here on SO, that ask about how to deal with GPU out of memory situations. I don't have any secrets to share beyond those. As a practical matter, my expectation is that well before you discovered how to go from running 10 training jobs at the same time to running 100 training jobs at the same time on the same GPU, you would run into other performance limits that would make the benefits of adding more jobs disappear.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.