sometimes,when I got a OOM error,but the parameters of LLM has been load in GPU,and it cannot be cleared automatically.
So,I try this
torch.cuda.empty_cache()
but it did't work.So,everytime I must restart my GPU to clear the cache.
sometimes,when I got a OOM error,but the parameters of LLM has been load in GPU,and it cannot be cleared automatically.
So,I try this
torch.cuda.empty_cache()
but it did't work.So,everytime I must restart my GPU to clear the cache.
Use with torch.no_grad(): during inference to avoid storing gradients and use mixed precision (torch.cuda.amp) to cut memory usage.
torch.cuda.empty_cache() does not “kill” memory still referenced by active objects. To truly free GPU memory- del unused variables, call gc.collect() and torch.cuda.empty_cache()