1

I'm trying to find duplicate entries in map values. But the thing is the list of values have multiple attributes/properties. Basically, if a title shows up more than once in a database, I would mark one entry as unique and mark the rest as duplicates.

Here's my current code:

// I have a Map that looks like...

host1 : id | title | host1 | url | state | duplicate
        id | title | host1 | url | state | duplicate

host2 : id | title | host2 | url | state | duplicate
        id | title | host2 | url | state | duplicate


    for (Map.Entry<String, List<Record>> e : recordsByHost.entrySet()) {
      boolean executed = false;
      for (Record r : e.getValue()) {
        int frequency = Collections.frequency(
          e
            .getValue()
            .stream()
            .map(Record::getTitle)
            .collect(Collectors.toList()),
          r.getTitle()
        );
        if ((frequency > 1) && (!executed)) {
          markDuplicates(r.getId(), r.getTitle());
          executed = true;
        } else {
          executed = false;
        }

The issue is when frequency is more than 2 (three records with the same title), the line evaluates to false and treats the third record / second duplicate as "unique".

I've been trying to rework my logic but I'm afraid I'm stuck. Any help / suggestions to get me unstuck would be greatly appreciated.

2
  • 1
    Do you absolutely have to use a Stream? I think responding to the boolean value returned from Set.add would be more useful. Commented Aug 4, 2021 at 18:42
  • @VGR - Not necessarily. What exactly do you mean by "responding to the boolean value returned from Set.add"? Do you have helpful articles you can share? Commented Aug 4, 2021 at 18:54

1 Answer 1

4

Set.add (and in fact, Collection.add) returns true if and only if the value was actually added to the Set. Since a Set always enforces uniqueness, you can use this to find duplicates:

void markDuplicates(Iterable<? extends Record> records) {
    Set<String> foundTitles = new HashSet<>();

    for (Record r : records) {
        String title = r.getTitle();
        if (title != null && !foundTitles.add(title)) {
            // title was not added, because it's already been found.
            markAsDuplicate(r);
        }
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Learn something new everyday! Thanks for that! I didn't exactly use all the code you provided but I am using set.add() instead of boolean = executed and all my unit tests work now! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.