How to exclude metadata from embedding?

I'm using LlamaIndex 0.14.7. I would like to embed document text without concatenating metadata, because I put a long text in metadata. Here's my code:

table_vec_store: SimpleVectorStore = SimpleVectorStore()
pipeline: IngestionPipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=300, chunk_overlap=15, include_metadata=False),
        embed_model
    ],
    vector_store=table_vec_store
)
pipeline.run(documents=table_documents)
self._table_index = VectorStoreIndex.from_vector_store(table_vec_store)

Even though I set the ingestion pipeline and tell the sentence splitter to not include metadata, I still got this error:

ValueError: Metadata length (348) is longer than chunk size (300). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.

I use document text for indexing. After retrieval, I also need the long text in metadata so I cannot simply drop the metadata text away. How should I fix my code? Thanks

edited Nov 6 at 10:19

DarkBee

14.4k9 gold badges84 silver badges133 bronze badges

asked Nov 6 at 9:31

Trams

4211 gold badge4 silver badges11 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to exclude metadata from embedding?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest