1

I'm building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages each), often containing numerous tables and charts.

I'm facing a few specific challenges: 30K+ total chunks across all docs → KNN retrieval gets noisy. Tried LLM-based reranking, but it's too slow and expensive to run on all 30K chunks. Tried summarizing each chunk to improve retrieval, but: It's too expensive to generate LLM summaries for all 30K sections. Table chunks are especially difficult: Embeddings perform poorly on structured/numeric data. Summary-style embeddings (e.g. first 300 tokens, or using just heading/caption) aren’t sufficient for value-level lookups. Looking for ideas or proven strategies to: Improve precision in initial retrieval at scale Handle table-heavy content more effectively Reduce cost while preserving accuracy

Any ideas, techniques, or tooling (besides LangChain) that worked for you?

1
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Commented Jun 2 at 17:55

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.