How do I prevent duplicate messages in context window, when using rag and memory?

Question

When using rag and memory, multiple identical copies of the same information is sent to the ai, when asking related questions.

I have

import java.util.ArrayList;
import java.util.List;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.openai.OpenAiChatModelName;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.memory.ChatMemoryAccess;

interface ChatInterface extends ChatMemoryAccess {
String chat(@memoryid int memoryId, @Usermessage String userMessage);
}

public class RagHistoryBug {
public static void main(String args[]) throws Exception {
String apiKey="Can't provide my key, but I am sure this bug also shows up with local run ai";
OpenAiChatModel chatModel=OpenAiChatModel.builder()
.apiKey(apiKey)
.modelName(OpenAiChatModelName.GPT_4_O_MINI)
.build();

    AiServices<ChatInterface> builder=AiServices.builder(ChatInterface.class)
            .chatModel(chatModel)
            .chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(11));
    
    ContentRetriever contentRetriever=new ContentRetriever() {
        @Override public List<Content> retrieve(Query q) {
            List<Content> list = new ArrayList<>();
            list.add(Content.from("I have a cat called Pulasy. It likes mice very much"));
            return list;
        }
    };
    builder=builder.contentRetriever(contentRetriever);
    ChatInterface assistant=builder.build();

    System.out.println(assistant.chat(1,"What is the name of my cat?"));
    System.out.println(assistant.chat(1,"Do my cat like mice?"));
    System.out.println(assistant.chat(1,"Is Pulasy a lion?"));
    
    ChatMemory mem = assistant.getChatMemory(1);
    List<ChatMessage> messages = mem.messages();
    for(ChatMessage msg: messages) {
        System.out.println("Msg:" + msg);
    }
}

}



If I run this program, the output is:

Your cat's name is Pulasy.
Yes, your cat Pulasy likes mice very much.
No, Pulasy is not a lion; Pulasy is a cat.
Msg:UserMessage { name = null contents = [TextContent { text = "What is the name of my cat?

Answer using the following information:
I have a cat called Pulasy. It likes mice very much" }] }
Msg:AiMessage { text = "Your cat's name is Pulasy." toolExecutionRequests = [] }
Msg:UserMessage { name = null contents = [TextContent { text = "Do my cat like mice?

Answer using the following information:
I have a cat called Pulasy. It likes mice very much" }] }
Msg:AiMessage { text = "Yes, your cat Pulasy likes mice very much." toolExecutionRequests = [] }
Msg:UserMessage { name = null contents = [TextContent { text = "Is Pulasy a lion?

Answer using the following information:
I have a cat called Pulasy. It likes mice very much" }] }
Msg:AiMessage { text = "No, Pulasy is not a lion; Pulasy is a cat." toolExecutionRequests = [] }

Note that:

Answer using the following information:
I have a cat called Pulasy. It likes mice very much" }] }
Msg:AiMessage { text = "No, Pulasy is not a lion; Pulasy is a cat." toolExecutionRequests = [] }

is included 3 times. (And I checked the used input tokens. It really is included 3 times. How do I prevent that?

I only understand python langchain so this may be wrong but checking for duplicates is usually on you, the developer, to implement. How should langchain know you don't want the same document for similar queries? RAG has no memory itself and it's possible you DO want the same document for related queries. — nabulator
– nabulator, Commented Jul 29 at 1:16
I don't know the surroundings of your code; For that, you did not provide enough. If you have one process that covers all queries, find a location that has access to where your CodeRetriever returns the result. Use a HashSet<TheResultClass>. If the process is more abstract (like a WebServer), use a WeakHashMap<UserInfos, TheResultClass>. (UserInfos is a class with data regarding the user, the current process, result etc, to identify the exact case where you want to prevent duplicates. Make sure equals() and hashCode() exist). With this, register answers and identify duplicates. — JayC667
– JayC667, Commented Jul 29 at 8:57
@nabulator But I do want the same document for similar cases. I just don't want to send in 8 copies to chat gpt. 1 is enough and there is no purpose to send the same document multiple times. — MTilsted
– MTilsted, Commented Jul 29 at 11:48
@JayC667 The problem is how do I know if content is redundant? ContentRetriever have no way to know, because redundant depend on exactly what is in the ChatMemoryProvider for the given MemoryId. And ContentRetriever don't have access to that(It don't have a memory id). So I guess I will have to implement my own ChatMemoryProvider which filters out old duplicates. — MTilsted
– MTilsted, Commented Jul 29 at 12:39
@JayC667 I just updated with a complete question, showing the issue. — MTilsted
– MTilsted, Commented Jul 29 at 14:30

Alex Lund · Accepted Answer · 2025-08-29 15:46:44Z

0

Yeah, this is a common issue when you mix RAG and chat memory. The retriever keeps adding the same info every turn, and the memory just stores it blindly so you end up with repeated chunks bloating the prompt.

Quick fix: either dedupe the content before adding it, or use something like mem0 or flumes ai that tracks memory as structured facts and avoids repeating stuff across turns.

answered Aug 29 at 15:46

Alex Lund

1

Sign up to request clarification or add additional context in comments.

1 Comment

Community Aug 29 at 16:34

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

How do I prevent duplicate messages in context window, when using rag and memory?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related