Text Conversion to Embedding and Upserting to the pinecone Vector DB

Question

I am learning Gen AI and I came across following script, it was working for the trainer however it is not working for me, may be due to recent version changes. I am trying to convert text to Vector Embedding and then upload the embedded values to the Pinecone Vector DB. Can someone help me where am I making a mistake?

I am using Jupyter Notebook to execute above commands*

Error -

AttributeError                            Traceback (most recent call last)
Cell In[44], line 8
      5 index_name="medical-bot"
      7 #Creating Embeddings for Each of The Text Chunks & storing
----> 8 docsearch=pc.from_texts([t.page_content for t in text_chunks], embeddings, index_name=index_name)

AttributeError: 'Pinecone' object has no attribute 'from_texts'

def load_data(data):
    loader = DirectoryLoader(data, glob="\*.pdf", loader_cls=PyPDFLoader)
    documents = loader.load()
    return documents

extracted_data = load_data("E:\\data")

def text_split(extracted_data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap = 20)
    text_chunks = text_splitter.split_documents(extracted_data)
    return text_chunks

text_chunks = text_split(extracted_data)


def download_hugging_face_embeddings():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    return embeddings

embeddings = download_hugging_face_embeddings()

query_result = embeddings.embed_query("Hello World")
print("Length", len(query_result))

from dotenv import load_dotenv
import os
load_dotenv()
pinecone_api_key = os.getenv("PINECONE_API_KEY")
pinecone_environment = os.getenv("PINECONE_API_ENV")

pc = Pinecone(api_key=pinecone_api_key,
environment=pinecone_environment)
index_name="medical-bot" #My Index Name created in Pinecone DB

#Creating Embeddings for Each of The Text Chunks & storing
docsearch=Pinecone.from_texts([t.page_content for t in text_chunks], embeddings, index_name=index_name)

Ghorban M. Tavakoly · Accepted Answer · 2024-05-05 21:00:06Z

0

Instead of your last 2 line try to implement below code:

!pip install langchain_pinecone

from langchain_pinecone import PineconeVectorStore  

os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
PineconeVectorStore(index_name=index_name,embedding=embeddings)

vectorstore_from_docs = PineconeVectorStore.from_documents(
    text_chunks,
    index_name=index_name,
    embedding=embeddings
)
index.describe_index_stats()

After code run successfully try to refresh your pinecone index. You can see vector count

edited May 5, 2024 at 21:00

Ghorban M. Tavakoly

1,2707 silver badges22 bronze badges

answered May 2, 2024 at 2:35

Paresha U

1

Sign up to request clarification or add additional context in comments.

1 Comment

Jan_B Over a year ago

For better readability, you should not format code blocks as quotations. Instead, try to enclose it with three Backtics

Collectives™ on Stack Overflow

Text Conversion to Embedding and Upserting to the pinecone Vector DB

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related