3

My apologies if this is a simple basic info that I should be knowing. This is the first time I am trying to use Java 8 streams and other features.

I have two ArrayLists containing same type of objects. Let's say list1 and list2. Let's say the lists has Person objects with a property "employeeId".

The scenario is that I need to merge these lists. However, list2 may have some objects that are same as in list1. So I am trying to remove the objects from list2 that are same as in list1 and get a result list that then I can merge in list1.

I am trying to do this with Java 8 removeIf() and stream() features. Following is my code:

public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
        List<PersonDto> filteredList = list2.removeIf(list2Obj -> {
            list1.stream()
                .anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
        } );
    }

The above code is giving compile error as below:

The method removeIf(Predicate) in the type Collection is not applicable for the arguments (( list2Obj) -> {})

So I changed the list2Obj at the start of "removeIf()" to (<PersonDto> list2Obj) as below:

public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
            List<PersonDto> filteredList = list2.removeIf((<PersonDto> list2Obj) -> {
                list1.stream()
                    .anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
            } );
        }

This gives me an error as below:

Syntax error on token "<", delete this token for the '<' in (<PersonDto> list2Obj) and Syntax error on token(s), misplaced construct(s) for the part from '-> {'

I am at loss on what I really need to do to make it work.

Would appreciate if somebody can please help me resolve this issue.

2 Answers 2

2

I've simplified your function just a little bit to make it more readable:

public static List<PersonDto> removeDuplicates(List<PersonDto> left, List<PersonDto> right) {
    left.removeIf(p -> {
        return right.stream().anyMatch(x -> (p.getEmployeeId() == x.getEmployeeId()));
    });
    return left;
}

Also notice that you are modifying the left parameter, you are not creating a new List.

You could also use: left.removeAll(right), but you need equals and hashcode for that and it seems you don't have them; or they are based on something else than employeeId.

Another option would be to collect those lists to a TreeSet and use removeAll:

TreeSet<PersonDto> leftTree = left.stream()
      .collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));

TreeSet<PersonDto> rightTree = right.stream()
      .collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));

leftTree.removeAll(rightTree);
Sign up to request clarification or add additional context in comments.

7 Comments

thanks for the quick reply, appreciated! I wasn't aware of removeAll() but you are right the equals and hashcode are based on something else. In you code, what is the first return returning? I mean, I know it's to return the filtered list but the placement of return keyword is throwing me off. I would think that first return before left.removeIf(.... The second return is returning the original list when there are no duplicates.
@Gauzy removeIf takes a Predicate as input; thus that returns a boolean.
Eugene, Hi! In your second snippet, I think you don't need to stream the lists in order to create the TreeSets. Just create the TreeSets with the comparator, and then call addAll with each list.
@FedericoPeraltaSchaffner hey! that would indeed merge them, but the OP wanted So I am trying to remove the objects from list2 that are same as in list1, so I guess it really depends on what the OP actually needs
@FedericoPeraltaSchaffner first its really hard to type your name from the phone :) second, yes, could be done like this too
|
2

I understand you are trying to merge both lists without duplicating the elements that belong to the intersection. There are many ways to do this. One is the way you've tried, i.e. remove elements from one list that belong to the other, then merge. And this, in turn, can be done in several ways.

One of these ways would be to keep the employee ids of one list in a HashSet and then use removeIf on the other list, with a predicate that checks whether each element has an employee id that is contained in the set. This is better than using anyMatch on the second list for each element of the first list, because HashSet.contains runs in O(1) amortized time. Here's a sketch of the solution:

// Determine larger and smaller lists
boolean list1Smaller = list1.size() < list2.size();
List<PersonDto> smallerList = list1Smaller ? list1 : list2;
List<PersonDto> largerList = list1Smaller ? list2 : list1;

// Create a Set with the employee ids of the larger list
// Assuming employee ids are long
Set<Long> largerSet = largerList.stream()
    .map(PersonDto::getEmployeeId)
    .collect(Collectors.toSet());

// Now remove elements from the smaller list
smallerList.removeIf(dto -> largerSet.contains(dto.getEmployeeId()));

The logic behind this is that HashSet.contains will take the same time for both a large and a small set, because it runs in O(1) amortized time. However, traversing a list and removing elements from it will be faster on smaller lists.

Then, you are ready to merge both lists:

largerList.addAll(smallerList);

4 Comments

The set of employee ids surely hasn’t the type Set<PersonDto>. And I would assume Collectors.toSet() sufficient. There is no need for a guaranteed HashSet.
@Holger Thanks, I've fixed that. But can we be sure that the set returned by Collectors.toSet() supports hashing?
We can be sure that the JRE developers will not change it to something worse just for fun or in other words, we can expect that toSet() will return a reasonably efficient Set implementation. It doesn’t have to use hashing, e.g. very small sets may perform better than HashSet. Collectors.toSet() allows the JRE to pick the best, i.e. future versions may return immutable sets being better prepared for even faster lookup than the mutable HashSet
@Holger Agreed, let's trust the JDK guys, then. Thank you for all your feedback not only in this answer but also in the others.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.