3

I want to remove duplicate records from an arraylist based on multiple properties. This is a sample domain object class:

private String mdl;
private String ndc;
private String gpi;
private String labelName;
private int seqNo;
private String vendorName;

The mdl, ndc, gpi, and seqNo together make up a unique record. I want to find duplicates in an arraylist that checks for these 4 properties and then removes the record from the list if a record with the same 4 properties already exists in the list.

5
  • 3
    customize the hashcode and equals method, then store the objects to a Set Commented Jul 11, 2019 at 23:36
  • Could you show an example of that? I've already overridden the hashcode and equals method but how exactly would I go about implementing it to only check for these specific properties? Commented Jul 11, 2019 at 23:39
  • 1
    From extensibility point of view I'm wondering if the asker really wants to have equals&hashCode or maybe it would be enough to have a custom comparator and a collection backed by it. This way "id-equivalence" could be kept away from all-field equals (which might be necessary in other part of application). Commented Jul 12, 2019 at 0:15
  • It would help also if you posted the code for the overridden equals and hashcode methods. Commented Jul 12, 2019 at 0:34
  • I don't think this is a duplicate of stackoverflow.com/questions/2265503/… based on OP's selected answer. Seems they want a way to compare without using equals() and hashCode() Commented Jul 12, 2019 at 21:55

2 Answers 2

4

.equals() and .hashCode() should be overridden to account for your key: mdl, ndc. gpi, seqNo. There are countless guides to doing this on this site, but something like:

@Override
public boolean equals(Object obj) {
    if(obj != null && obj instanceof MyClass) {
        MyClass o = (MyClass)obj;
        return mdl.equals(o.mdl) && ndc.equals(o.ndc) &&
          gpi.equals(o.gpi) && seqNo == o.seqNo;
    }
    return false;
}

@Override
public int hashCode() {
    return Objects.hash(mdl, ndc, gpi, seqNo);
}

There may be more efficient ways of implementing them if that's a concern.

Then you can just convert your list to a set with:

Set<MyClass> set = new HashSet<>(list);

The resulting set won't have any duplicates and you can now replace your list with the new values list = new ArrayList<>(set); if you need to.

If you want to maintain the order of the items in the original list, instantiate LinkedHashSet instead of HashSet.

Unrelated to your direct question, perhaps consider using a Set instead of List if you want to avoid duplicates in the first place. It will make your code more efficient (less memory usage without the duplicates) and eliminate the need to search for duplicates afterwards.

Sign up to request clarification or add additional context in comments.

6 Comments

OP does not specify he overrode those methods to support the uniqueness logic with those 4 variables in the question, hence this is based on the assumption that it does.
@buræquete Actually, OP did specify in his comment under the question: "I've already overridden the hashcode and equals method"
yes but you can overrode & include a logic that wouldn't make those 4 variables as the unique hash values, right? He might've included other fields? He did not specify that he did so in those methods, if you include in your answer how to do that, maybe then that would be OK ("but how exactly would I go about implementing it to only check for these specific properties?")
@buræquete I see what you're saying. I interpret "how exactly would I go about implementing it to only check for these specific properties?" as meaning OP wants to know how to eliminate the duplicates based on the equals & hashcode functions, not how to correctly implement equals and hashcode. I suppose it's ambiguous.
I would consider this approach inappropriate for a single use case, unless the rules of the object were consistent and allowed for equals (and hashcode) to always be used against these properties - it's not "wrong", but it may not be "right" either
|
2

You can try doing the following;

List<Obj> list = ...; // list contains multiple objects
Collection<Obj> nonDuplicateCollection = list.stream()
        .collect(Collectors.toMap(Obj::generateUniqueKey, Function.identity(), (a, b) -> a))
        .values();

(a, b) -> a, means that when two objects are identical, the final map will contain the earlier object, the latter one will be discarded, you can change this behaviour if you'd like the latter one.

where Obj is;

public static class Obj {

    private String mdl;
    private String ndc;
    private String gpi;
    private String labelName;
    private int seqNo;
    private String vendorName;

    // other getter/setters

    public String generateUniqueKey() {
        return mdl + ndc + gpi + seqNo;
    }
}

I'd rather do something like this, than to override hashCode or equals methods, which might be necessary in another logic in their default states... Plus explicitly showing how you are asserting the uniqueness with a proper method like generateUniqueKey is better than hiding that logic in some hashCode method is much better in terms of readability & maintainability.

1 Comment

You could do it this way if you didn't want to override equals and hashCode for whatever reason, but OP has already implemented them, which seems like the correct thing to do given the "unique record" definition in the question. In which case just putting the list into a set would be a simpler approach.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.