Integrating llama index vectorstoreindex with Langchain agents for RAG Applications

Question

I have been reading the documentation all day and can't seem to wrap my head around how I can create a VectorStoreIndex with llama_index and use the created embeddings as supplemental information for a RAG application/chatbot that can communicate with a user. I want to use llama_index because they have some cool ways to perform more advanced retrieval techniques like sentence window retrieval and auto-merging retrieval (to be fair I have not investigated if Langchain also supports these types of vector retrieval methods). I want to use LangChain because of its functionality for developing more complex prompt templates (similarly I have not really investigated if llama_index supports this).

My goal is to ultimately evaluate how these different retrieval methods perform within the context of the application/chatbot. I know how to evaluate them with a separate evaluation questions file, but I would like to do things like compare the speed and humanness of responses, token usage, etc.

The code for a minimal reproducible example would be as follows

LangChain ChatBot initiation

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

from langchain.memory import ChatMessageHistory


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are the world's greatest... \
            Use this document base to help you provide the best support possible to everyone you engage with. 
            """,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chat = ChatOpenAI(model=llm_model, temperature=0.7)



chain = prompt | chat


chat_history = ChatMessageHistory()

while True:
    user_input = input("You: ")
    chat_history.add_user_message(user_input)
    
    response = chain.invoke({"messages": chat_history.messages})
    
    if user_input.lower() == 'exit':
        break
    
    print("AI:", response)
    chat_history.add_ai_message(response)

Llama index sentence window retrieval

from llama_index.core.node_parser import SentenceWindowNodeParser
        from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
        from llama_index.core.postprocessor import LLMRerank
    
class SentenceWindowUtils:
    def __init__(self, documents, llm, embed_model, sentence_window_size):
        self.documents = documents
        self.llm = llm
        self.embed_model = embed_model
        self.sentence_window_size = sentence_window_size
        # self.save_dir = save_dir

        self.node_parser = SentenceWindowNodeParser.from_defaults(
            window_size=self.sentence_window_size,
            window_metadata_key="window",
            original_text_metadata_key="original_text",
        )

        self.sentence_context = ServiceContext.from_defaults(
            llm=self.llm,
            embed_model=self.embed_model,
            node_parser=self.node_parser,
        )

    def build_sentence_window_index(self, save_dir):
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
            sentence_index = VectorStoreIndex.from_documents(
                self.documents, service_context=self.sentence_context
            )
            sentence_index.storage_context.persist(persist_dir=save_dir)
        else:
            sentence_index = load_index_from_storage(
                StorageContext.from_defaults(persist_dir=save_dir),
                service_context=self.sentence_context,
            )

        return sentence_index

    def get_sentence_window_query_engine(self, sentence_index, similarity_top_k=6, rerank_top_n=3):
        postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
        rerank = LLMRerank(top_n=rerank_top_n, service_context=self.sentence_context)

        sentence_window_engine = sentence_index.as_query_engine(
            similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
        )

        return sentence_window_engine


sentence_window = SentenceWindowUtils(documents=documents, llm = llm, embed_model=embed_model, sentence_window_size=1)
sentence_window_1 = sentence_window.build_sentence_window_index(save_dir='./indexes/sentence_window_index_1')
sentence_window_engine_1 = sentence_window.get_sentence_window_query_engine(sentence_window_1)

Both blocks of code independently will run. But the goal is that when a query is performed that warrants a retrieval to the existing document base, I can use the sentence_window_engine that was built. I suppose I could retrieve relevant information based on the query and then pass that information into a subsequent prompt for the chatbot, but I would like to try and avoid including the document data in a prompt.

Any suggestions?

JP1990 · Accepted Answer · 2024-03-30 19:13:34Z

I never found an exact way to retrieve the information via llama_index like I had hoped but I basically found a workaround by doing what I initially wanted to avoid by querying my document base and adding that as context information to my chatbot as such

#### Conversation Prompt Chain #####
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are the world's greatest...
            You have access to an extensive document base of information.
            Relevant Information to the user query is provided below. Use the information at your own discretion if it improves the quality of the response.
            A summary of the previous conversation is also provided to contextualize you on previous conversation.

            <<Relevant Information>>
            {relevant_information}


            << Previous Conversation Summary>>
            {previous_conversation}


            << Current Prompt >>
            {user_input}
            """,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chat = ChatOpenAI(model=llm_model, temperature=0.0)



chain = prompt | chat


### Application Start ###


while True:
    # Some code....
    if route['destination'] == "data querying":
                formatted_response = query_and_format_sql(username, password, host, port, mydatabase, query_prompt, model = 'gpt-4', client_name = client_name, user_input=user_input)
                print(formatted_response)
                chat_history.add_ai_message(AIMessage(f'The previous query triggered a SQL agent response that was {formatted_response}'))
        else:
            # Search Document Base
            RAG_Context = sentence_window_engine_1.query(user_input)
    
            # Inject the retrieved information into the chatbot's context
            context_with_relevant_info = {
                "user_input": user_input,
                "messages": chat_history.messages,
                "previous_conversation": memory.load_memory_variables({}),
                "relevant_information": RAG_Context # ==> Inject relevant information from llama_index here
            }
            
            response = chain.invoke(context_with_relevant_info)

I haven't ran into a token issue yet but I can imagine if my application grows and scales it may run into problem trying to inject relevant information, the message history, and the prompt. I limit my memory with a ConversationBufferMemoryHistory and that seems to work ok for now.

Harshil Gandhi · Accepted Answer · 2025-01-23 03:35:37Z

You can do the LlamaIndexTool from LlamaIndex:

from llama_index.core.langchain_helpers.agents import IndexToolConfig, LlamaIndexTool

query_engine = index.as_query_engine(
            response_mode="tree_summarize",
            similarity_top_k=25,
            node_postprocessors=[rerank],
            chat_mode="condense_plus_context",
            verbose=True,
            system_prompt=context_str
        )
tool_config = IndexToolConfig(
query_engine=query_engine,
name=f"vector_index",
description=f"useful for when you want to answer queries about the documents and related information",
tool_kwargs={"return_direct": True},
)
rag_tool = LlamaIndexTool.from_tool_config(tool_config)
tools = [rag_tool]
agent = create_react_agent(
    model=llm,
    tools=tools,
    system_message=CUSTOM_PREFIX
)

Collectives™ on Stack Overflow

Integrating llama index vectorstoreindex with Langchain agents for RAG Applications

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related