Im planning to develop Django framework for the LLM fine-tuned model based on HuggingFace models, I want the text to be streamed from model to the Django. My issue is that model streams the output in the terminal but not sure how to make it in a stream in a variable so that I can transfer it to the end-user
I tried using pipeline with TextStream but it only output the text in the terminal
model_id = "model_a"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
# attn_implementation="flash_attention_2"
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
streamer = TextStreamer(tokenizer)
pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
model_kwargs={"torch_dtype": torch.bfloat16},
streamer=streamer
)