After running this code within a jupyter notebook, it runs properly. However, the memory is still stored in the GPU. How do I get rid of this memory to clear up space on my GPU. Sorry if I am formatting this question poorly, I am not used to posting. Provided is the code:
llm = LLM(
model=model_path,
gpu_memory_utilization=0.7,
max_model_len=2048,
)
llm = LLM(model=model_path, dtype=torch.bfloat16, trust_remote_code=True, max_model_len=2048, quantization="bitsandbytes", load_format="bitsandbytes", gpu_memory_utilization = 0.8)
I tried deleting llm and clearing cache which decreases the allocated and chached memory, but I cannot rerun the LLM method as I get an OOM Error (the previous call still has stored memory).