Streaming LLM output in Django

Question

Im planning to develop Django framework for the LLM fine-tuned model based on HuggingFace models, I want the text to be streamed from model to the Django. My issue is that model streams the output in the terminal but not sure how to make it in a stream in a variable so that I can transfer it to the end-user

I tried using pipeline with TextStream but it only output the text in the terminal


model_id = "model_a"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2"
)
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)
streamer = TextStreamer(tokenizer)
pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},
    streamer=streamer
)

damith219 · Accepted Answer · 2024-05-27 09:11:57Z

0

What you're looking for is the TextIteratorStreamer, which provides a generator for streaming. in your case, if you replace the 'TextStreamer' with TextIteratorStreamer, you can use the streamer as follows to stream an output.

    def get_response(question):
        query_response = YOUR_LLM_CHAIN(question)


    generation_thread = Thread(target=get_response, args=("Tell me a joke",))
    generation_thread.start()

    for new_token in streamer:
        yield new_token

With streaming enabled as a text/event-stream you should be able to stream your new tokens as they are being generated.

answered May 27, 2024 at 9:11

damith219

13711 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Streaming LLM output in Django

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related