2

I am using langchain to read data from a pdf and convert it into a chunks of text. I then embed the data into vectors and load it into a vector store using pinecone. I am getting a maxretry error.

I guess I am loading all the chunks at once which may be causing the issue. Is there some function like add_document which can be used to load data/chunks one by one.

def load_document(file):
    from langchain.document_loaders import PyPDFLoader
    print(f'Loading {file} ..')
    loader = PyPDFLoader(file)
    #the below line will return a list of langchain documents.1 document per page
    data = loader.load() 
    return data


data=load_document("DATA/capacitance.pdf")
#prints  content of second page
print(data[1].page_content)
print(data[2].metadata)

#chunking
def chunk_data(data,chunk_size=256):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=0)
    chunks=text_splitter.split_documents(data)
    print(type(chunks))
    return chunks

chunks=chunk_data(data)
print(len(chunks))

Till chunking my code works well. It is able to load pdf convert to text and chunk the data as well. Now when it comes to embedding, I tried using Pinecone and FAISS. For pine cone I already created an index 'electrostatics'

pinecone.create_index('electrostatics',dimension=1536,metric='cosine')

import os
from dotenv import load_dotenv,find_dotenv
load_dotenv("D:/test/.env")
print(os.environ.get("OPENAI_API_KEY"))

def insert_embeddings(index_name,chunks):
    import pinecone
    from langchain.vectorstores import Pinecone
    from langchain.embeddings.openai import OpenAIEmbeddings

    embeddings=OpenAIEmbeddings()
    pinecone.init(api_key=os.environ.get("PINECONE_API_KEY"),environment=os.environ.get("PINECONE_ENV"))
    vector_store=Pinecone.from_documents(chunks,embeddings,index_name=index_name)
    print("Ok")

I tried embedding in the following ways

index_name='electrostatics'
vector_store=insert_embeddings(index_name,chunks)

With FAISS

from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings=OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)

enter image description here

4
  • 1
    You have simply hit your OpenAI API quota. Commented Jul 11, 2023 at 8:13
  • As above, definitely an OpenAI API quota issue. Here are some additional helpful threads: community.openai.com/t/i-am-getting-ratelimiterror/124977 and community.openai.com/t/rate-limit-error/14769 Commented Jul 11, 2023 at 10:30
  • is it because the chunksize is high. should i split it and try? Commented Jul 11, 2023 at 11:56
  • You have exhausted your monthly limit for the API i.e you have consumed all the credits allocated to your plan. More on this. You would need to upgrade to a new plan. Commented Jul 11, 2023 at 12:50

1 Answer 1

2

This error typically happends when there are connection issues or timeouts, I think it is better to insert data in chunks like this:

def insert_embeddings(index_name, chunks):
    import pinecone
    from langchain.vectorstores import Pinecone
    from langchain.embeddings.openai import OpenAIEmbeddings

    embeddings = OpenAIEmbeddings()
    pinecone.init(api_key=os.environ.get("PINECONE_API_KEY"), environment=os.environ.get("PINECONE_ENV"))

    vector_store = Pinecone(index_name=index_name, embeddings=embeddings)
    
    # Batch insert the chunks into the vector store
    batch_size = 100  # Define your preferred batch size
    for i in range(0, len(chunks), batch_size):
        chunk_batch = chunks[i:i + batch_size]
        vector_store.add_documents(chunk_batch)

    # Flush the vector store to ensure all documents are inserted
    vector_store.flush()

    print("Ok")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.