I have been reading the documentation all day and can't seem to wrap my head around how I can create a VectorStoreIndex with llama_index and use the created embeddings as supplemental information for a RAG application/chatbot that can communicate with a user. I want to use llama_index because they have some cool ways to perform more advanced retrieval techniques like sentence window retrieval and auto-merging retrieval (to be fair I have not investigated if Langchain also supports these types of vector retrieval methods). I want to use LangChain because of its functionality for developing more complex prompt templates (similarly I have not really investigated if llama_index supports this).
My goal is to ultimately evaluate how these different retrieval methods perform within the context of the application/chatbot. I know how to evaluate them with a separate evaluation questions file, but I would like to do things like compare the speed and humanness of responses, token usage, etc.
The code for a minimal reproducible example would be as follows
- LangChain ChatBot initiation
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ChatMessageHistory
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are the world's greatest... \
Use this document base to help you provide the best support possible to everyone you engage with.
""",
),
MessagesPlaceholder(variable_name="messages"),
]
)
chat = ChatOpenAI(model=llm_model, temperature=0.7)
chain = prompt | chat
chat_history = ChatMessageHistory()
while True:
user_input = input("You: ")
chat_history.add_user_message(user_input)
response = chain.invoke({"messages": chat_history.messages})
if user_input.lower() == 'exit':
break
print("AI:", response)
chat_history.add_ai_message(response)
- Llama index sentence window retrieval
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.postprocessor import LLMRerank
class SentenceWindowUtils:
def __init__(self, documents, llm, embed_model, sentence_window_size):
self.documents = documents
self.llm = llm
self.embed_model = embed_model
self.sentence_window_size = sentence_window_size
# self.save_dir = save_dir
self.node_parser = SentenceWindowNodeParser.from_defaults(
window_size=self.sentence_window_size,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
self.sentence_context = ServiceContext.from_defaults(
llm=self.llm,
embed_model=self.embed_model,
node_parser=self.node_parser,
)
def build_sentence_window_index(self, save_dir):
if not os.path.exists(save_dir):
os.makedirs(save_dir)
sentence_index = VectorStoreIndex.from_documents(
self.documents, service_context=self.sentence_context
)
sentence_index.storage_context.persist(persist_dir=save_dir)
else:
sentence_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=save_dir),
service_context=self.sentence_context,
)
return sentence_index
def get_sentence_window_query_engine(self, sentence_index, similarity_top_k=6, rerank_top_n=3):
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
rerank = LLMRerank(top_n=rerank_top_n, service_context=self.sentence_context)
sentence_window_engine = sentence_index.as_query_engine(
similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
)
return sentence_window_engine
sentence_window = SentenceWindowUtils(documents=documents, llm = llm, embed_model=embed_model, sentence_window_size=1)
sentence_window_1 = sentence_window.build_sentence_window_index(save_dir='./indexes/sentence_window_index_1')
sentence_window_engine_1 = sentence_window.get_sentence_window_query_engine(sentence_window_1)
Both blocks of code independently will run. But the goal is that when a query is performed that warrants a retrieval to the existing document base, I can use the sentence_window_engine that was built. I suppose I could retrieve relevant information based on the query and then pass that information into a subsequent prompt for the chatbot, but I would like to try and avoid including the document data in a prompt.
Any suggestions?