Regarding rag for telephony with deepgram

Question

I'm building a voice-based calling system where users can create AI agents that make outbound phone calls. The agent uses Deepgram for real-time transcription and ElevenLabs/Cartesia for speech generation.

For each customer on the platform, I maintain a dedicated knowledge base. Users can upload PDFs, documents, or text, and I chunk these using a recursive text splitter and store the embeddings in Pinecone (both dense and sparse vectors).

Before a call, I query the customer’s knowledge base for information like company overview and inject that into the system prompt.

I'm trying to improve two things:

RAG Quality I’m not fully satisfied with the answer accuracy and relevance. What approaches or design changes can help me significantly improve my RAG quality, especially for short spoken queries?
Real-Time Voice Search How can I perform fast, real-time retrieval using the streaming transcripts from Deepgram during ongoing calls? Are there recommended architectures or pipelines for running RAG continuously as the user speaks?

Any insights based on production-grade voice agents or improvements to my approach would be very helpful.

Answer 1 · 2025-11-16 15:18:36Z

Dan Getz

• Nov 16 at 15:18

Please do not use AI text generators for rewriting questions (or answers, for that matter). AI generated content is not allowed on Stack Overflow.

Collectives™ on Stack Overflow

Regarding rag for telephony with deepgram

2 Replies 2

Your Reply

Collectives™ on Stack Overflow

Regarding rag for telephony with deepgram

2 Replies 2

Your Reply

Sign up or log in

Post as a guest