I'm building a voice-based calling system where users can create AI agents that make outbound phone calls. The agent uses Deepgram for real-time transcription and ElevenLabs/Cartesia for speech generation.

For each customer on the platform, I maintain a dedicated knowledge base. Users can upload PDFs, documents, or text, and I chunk these using a recursive text splitter and store the embeddings in Pinecone (both dense and sparse vectors).

Before a call, I query the customer’s knowledge base for information like company overview and inject that into the system prompt.

I'm trying to improve two things:

  1. RAG Quality I’m not fully satisfied with the answer accuracy and relevance. What approaches or design changes can help me significantly improve my RAG quality, especially for short spoken queries?

  2. Real-Time Voice Search How can I perform fast, real-time retrieval using the streaming transcripts from Deepgram during ongoing calls? Are there recommended architectures or pipelines for running RAG continuously as the user speaks?

Any insights based on production-grade voice agents or improvements to my approach would be very helpful.

2 Replies 2

Please do not use AI text generators for rewriting questions (or answers, for that matter). AI generated content is not allowed on Stack Overflow.

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.