Question:
I'm building a memory-augmented AI system using RAG with persistent vector storage, but facing memory leaks and context contamination between sessions. Problem:
Vector embeddings aren't garbage collected after context switches Previous session embeddings bleeding into new conversations FAISS index performance degrades after ~1000 retrievals
Current Implementation:
pythonclass MemoAI:
def __init__(self):
self.vector_store = FAISS.load_local("./embeddings", embeddings)
self.memory_buffer = ConversationSummaryBufferMemory(
llm=llm, max_token_limit=2000
)
def add_memory(self, text, metadata):
chunks = self.recursive_splitter.split_text(text)
embeddings = self.embedder.embed_documents(chunks)
# Problem: These embeddings persist even after session ends
self.vector_store.add_embeddings(
[(chunk, embedding) for chunk, embedding in zip(chunks, embeddings)],
metadatas=[metadata] * len(chunks)
)
def retrieve_context(self, query, k=5):
# Issue: Returns stale chunks from previous sessions
return self.vector_store.similarity_search_with_score(
query, k=k, filter={"session_id": self.current_session}
)
Reproducible Issue:
python# Session 1
memo_ai.add_memory("User likes Python", {"session_id": "session_1"})
# Session 2 (new user)
memo_ai.switch_session("session_2")
result = memo_ai.retrieve_context("What programming language?")
# BUG: Returns "likes Python" from session_1 despite filter
# Memory usage: 2.3GB and growing (started at 500MB)
What I've Tried:
Manual cleanup with del self.vector_store and gc.collect() - memory not released Per-session FAISS indexes - too slow for real-time Metadata filtering - inconsistent results
Environment:
LangChain 0.1.0, FAISS-GPU 1.7.2, Python 3.10 32GB RAM, RTX 3090
Question: How can I properly isolate memory between sessions without rebuilding the entire vector store? Is there a pattern for efficient garbage collection of embeddings in production RAG systems?