0

I'm using LlamaIndex 0.14.7. I would like to embed document text without concatenating metadata, because I put a long text in metadata. Here's my code:

table_vec_store: SimpleVectorStore = SimpleVectorStore()
pipeline: IngestionPipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=300, chunk_overlap=15, include_metadata=False),
        embed_model
    ],
    vector_store=table_vec_store
)
pipeline.run(documents=table_documents)
self._table_index = VectorStoreIndex.from_vector_store(table_vec_store)

Even though I set the ingestion pipeline and tell the sentence splitter to not include metadata, I still got this error:

ValueError: Metadata length (348) is longer than chunk size (300). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.

I use document text for indexing. After retrieval, I also need the long text in metadata so I cannot simply drop the metadata text away. How should I fix my code? Thanks

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.