Currently, I want to build RAG chatbot for production. I already had my LLM API and I want to create a custom LLM and then use this in RetrievalQA.from_chain_type function. I don't know whether Langchain support this in my case.
I read about this topic on reddit: https://www.reddit.com/r/LangChain/comments/17v1rhv/integrating_llm_rest_api_into_a_langchain/ And in langchain document: https://python.langchain.com/docs/modules/model_io/llms/custom_llm
But this still does not work when I apply the custom LLM to qa_chain. Below is my code, hope for the support from you, sorry for my language, english is not my mother tongue.
from pydantic import Extra
import requests
from typing import Any, List, Mapping, Optional
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
class LlamaLLM(LLM):
llm_url = 'https:/myhost/llama/api'
class Config:
extra = Extra.forbid
@property
def _llm_type(self) -> str:
return "Llama2 7B"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
payload = {
"inputs": prompt,
"parameters": {"max_new_tokens": 100},
"token": "abcdfejkwehr"
}
headers = {"Content-Type": "application/json"}
response = requests.post(self.llm_url, json=payload, headers=headers, verify=False)
response.raise_for_status()
# print("API Response:", response.json())
return response.json()['generated_text'] # get the response from the API
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"llmUrl": self.llm_url}
llm = LlamaLLM()
#Testing
prompt = "[INST] Question: Who is Albert Einstein? \n Answer: [/INST]"
result = llm._call(prompt)
print(result)
Albert Einstein (1879-1955) was a German-born theoretical physicist who is widely regarded as one of the most influential scientists of the 20th century. He is best known for his theory of relativity, which revolutionized our understanding of space and time, and his famous equation E=mc².
# Build prompt
from langchain.prompts import PromptTemplate
template = """[INST] <<SYS>>
Answer the question base on the context below.
<</SYS>>
Context: {context}
Question: {question}
Answer:
[/INST]"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)
# Run chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,
verbose=True,
# retriever=vectordb.as_retriever(),
retriever=custom_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]
Encountered some errors. Please recheck your request!
The custom retrieval in my case combined retrieval and rerank. I already test and it's OK.
I also test with the normal retrieval, but it still don't work. So I think the retrieval is not the cause for the error.
retriever=vectordb.as_retriever()
Besides, it also has the issue related to insecure request, but whether it affect to the requests. (I also don't know how to fix it)
/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'myhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Encountered some errors. Please recheck your request!
Moreover, below is the api format which I have, does it have any problem?
curl --location 'https:/myhost:10001/llama/api' -k \
--header 'Content-Type: application/json' \
--data-raw '{
"inputs": "[INST] Question: Who is Albert Einstein? \n Answer: [/INST]",
"parameters": {"max_new_tokens":100},
"token": "abcdfejkwehr"
}
This happens because of the context length setting of the API. So I already fixed it and it's work fine.
