langchain - RAG Pipeline Memory Leak - Vector Embeddings Not Releasing After Context Switch in Memo AI

Question:

I'm building a memory-augmented AI system using RAG with persistent vector storage, but facing memory leaks and context contamination between sessions. Problem:

Vector embeddings aren't garbage collected after context switches Previous session embeddings bleeding into new conversations FAISS index performance degrades after ~1000 retrievals

Current Implementation:
pythonclass MemoAI:
    def __init__(self):
        self.vector_store = FAISS.load_local("./embeddings", embeddings)
        self.memory_buffer = ConversationSummaryBufferMemory(
            llm=llm, max_token_limit=2000
        )
        
    def add_memory(self, text, metadata):
        chunks = self.recursive_splitter.split_text(text)
        embeddings = self.embedder.embed_documents(chunks)
        
        # Problem: These embeddings persist even after session ends
        self.vector_store.add_embeddings(
            [(chunk, embedding) for chunk, embedding in zip(chunks, embeddings)],
            metadatas=[metadata] * len(chunks)
        )
        
    def retrieve_context(self, query, k=5):
        # Issue: Returns stale chunks from previous sessions
        return self.vector_store.similarity_search_with_score(
            query, k=k, filter={"session_id": self.current_session}
        )

Reproducible Issue:

python# Session 1
memo_ai.add_memory("User likes Python", {"session_id": "session_1"})

# Session 2 (new user)
memo_ai.switch_session("session_2")
result = memo_ai.retrieve_context("What programming language?")
# BUG: Returns "likes Python" from session_1 despite filter
# Memory usage: 2.3GB and growing (started at 500MB)

What I've Tried:

Manual cleanup with del self.vector_store and gc.collect() - memory not released Per-session FAISS indexes - too slow for real-time Metadata filtering - inconsistent results

Environment:

LangChain 0.1.0, FAISS-GPU 1.7.2, Python 3.10 32GB RAM, RTX 3090

Question: How can I properly isolate memory between sessions without rebuilding the entire vector store? Is there a pattern for efficient garbage collection of embeddings in production RAG systems?

asked Sep 18 at 8:20

TensorMind

13 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

RAG Pipeline Memory Leak - Vector Embeddings Not Releasing After Context Switch in Memo AI

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest