Not a coding question, but a documentation omission that is nowhere mentioned online at this point. When using the Langchain CSVLoader, which column is being vectorized via the OpenAI embeddings I am using?
I ask because viewing this code below, I vectorized a sample CSV, did searches (on Pinecone) and consistently received back DISsimilar responses. How do know which column Langchain is actually identifying to vectorize?
loader = CSVLoader(file_path=file, metadata_columns=['col2', 'col3', 'col4','col5'])
langchain_docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
docs = text_splitter.split_documents(langchain_docs)
for doc in docs:
doc.metadata.pop('source')
doc.metadata.pop('row')
my_index = pc_store.from_documents(docs, embeddings, index_name=PINECONE_INDEX_NAME)
I am assuming the CSVLoader is then identifying col1 to vectorize. But, searches of Pinecone are terrible, leading me to think some other column is being vectorized.