1

Im planning to develop Django framework for the LLM fine-tuned model based on HuggingFace models, I want the text to be streamed from model to the Django. My issue is that model streams the output in the terminal but not sure how to make it in a stream in a variable so that I can transfer it to the end-user

I tried using pipeline with TextStream but it only output the text in the terminal


model_id = "model_a"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2"
)
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)
streamer = TextStreamer(tokenizer)
pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},
    streamer=streamer
)

1 Answer 1

0

What you're looking for is the TextIteratorStreamer, which provides a generator for streaming. in your case, if you replace the 'TextStreamer' with TextIteratorStreamer, you can use the streamer as follows to stream an output.

    def get_response(question):
        query_response = YOUR_LLM_CHAIN(question)


    generation_thread = Thread(target=get_response, args=("Tell me a joke",))
    generation_thread.start()

    for new_token in streamer:
        yield new_token

With streaming enabled as a text/event-stream you should be able to stream your new tokens as they are being generated.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.