0

After running this code within a jupyter notebook, it runs properly. However, the memory is still stored in the GPU. How do I get rid of this memory to clear up space on my GPU. Sorry if I am formatting this question poorly, I am not used to posting. Provided is the code:

llm = LLM(
  model=model_path, 
  gpu_memory_utilization=0.7, 
  max_model_len=2048,
)

llm = LLM(model=model_path, dtype=torch.bfloat16, trust_remote_code=True, max_model_len=2048, quantization="bitsandbytes", load_format="bitsandbytes", gpu_memory_utilization = 0.8)

I tried deleting llm and clearing cache which decreases the allocated and chached memory, but I cannot rerun the LLM method as I get an OOM Error (the previous call still has stored memory).

1

2 Answers 2

0

Well, how about killing the vllm related process using pkill -9 -ef <part or whole of the vllm process name or cli command>? You can check the vllm process consuming GPU RAM with nvidia-smi, nvitop or nvtop.

Sign up to request clarification or add additional context in comments.

Comments

0

I regularly use the holy trinity of cleanup with pytorch

  1. Delete model object with Python del
  2. Empty cache with torch.cuda.empty_cache()
  3. Python garbage collection gc.collect() (import gc at top of script)
del llm
torch.cuda.empty_cache()
gc.collect()

That said Jupyter notebooks are weird, you may just have to restart the kernel if these don't work since Jupyter has its own cachine mechanisms.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.