0

I am trying to create a ReAct agent in LlamaIndex using a local gpt-oss-20b model.

I have successfully loaded my local model using HuggingFaceLLM from llama_index.llms.huggingface and it seems to be working correctly. Here is the code I'm using for that part:

import torch
from llama_index.llms.huggingface import HuggingFaceLLM

# This part works fine
llm = HuggingFaceLLM(
    model_name="../gpt-oss-20b-local",
    tokenizer_name="../gpt-oss-20b-local",
    device_map="auto",
    model_kwargs={"torch_dtype": torch.float16},
)

Now, I want to use this llm to create an agent. I am following the official LlamaIndex documentation for the ReAct Agent: https://docs.llamaindex.ai/en/stable/examples/agent/react_agent/

The documentation provides an example for streaming events that looks like this:

# (Assuming agent and handler are already defined as per the docs)
# ... agent setup code ...

async for ev in handler.stream_events():
    print(ev)
    print("---")

When I try to add this loop to my script, I get a syntax error because it's not inside an async function.

My Full (Simplified) Code:

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

# Assume 'llm' is loaded as shown above

def multiply(a: int, b: int) -> int:
    """Multiply two integers and returns the result integer"""
    return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)

# Setup agent
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

agent = ReActAgent.from_tools(
    [multiply_tool], 
    llm=llm, 
    verbose=True,
    callback_manager=callback_manager
)

# This is the problematic part from the documentation
response = agent.stream_chat("What is 21 * 21?")
handler = llama_debug.get_event_handler("stream_chat")

# The following line causes the error
async for ev in handler.stream_events():
    print(ev)
    print("---")

The Error:

File "test.py", line 46
  async for ev in handler.stream_events():
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: 'async for' outside async function

I understand that async for must be within an async def function, but the LlamaIndex documentation presents the code this way. How am I supposed to run this example code?

Do I need to wrap this logic in a main async function and then run it with asyncio.run()? What is the correct way to execute these asynchronous streaming examples from the documentation in a standard Python script?

3
  • 1
    you can't use async for (or await) outside function. it may need to put code in async functiona and use asyncio.run(your_function()). Did you try it? Commented Aug 30 at 11:10
  • 1
    maybe documentation shows how to use in Jupyter (Notebook) which probably already runs async loop. Frankly, in documentation I even found text "If you're opening this Notebook on colab ..." Commented Aug 30 at 11:12
  • @furas does it need context? I want to use each user's message history Commented Aug 31 at 7:40

1 Answer 1

1

Documentation shows

If you're opening this Notebook on colab, ...

so it assums you will run it on server Google Colab (or similar servers)
or using local Jupyter Notebook or Jupyter Lab.

And Notebook is aready running async loop which is needed to run async code.

If you want to run it as normal script then you have to put code in async def function and use asyncio.run(function())


Minimal working code based on example from documentation:

import asyncio
import os

from llama_index.llms.openai import OpenAI
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context

os.environ["OPENAI_API_KEY"] = "sk-..."
#print(f"{os.getenv("OPENAI_API_KEY") = }")  # check if it already exists

def multiply(a: int, b: int) -> int:
    """Multiply two integers and returns the result integer"""
    return a * b

def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b

async def main():
    llm = OpenAI(model="gpt-4o-mini")
    agent = ReActAgent(tools=[multiply, add], llm=llm)

    # Create a context to store the conversation history/session state
    ctx = Context(agent)

    from llama_index.core.agent.workflow import AgentStream, ToolCallResult

    handler = agent.run("What is 20+(2*4)?", ctx=ctx)

    async for ev in handler.stream_events():
        # if isinstance(ev, ToolCallResult):
        #     print(f"\nCall {ev.tool_name} with {ev.tool_kwargs}\nReturned: {ev.tool_output}")
        if isinstance(ev, AgentStream):
            print(f"{ev.delta}", end="", flush=True)

    response = await handler

    print(response)

# ---

asyncio.run(main())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.