Problem edit
After conversation in chat, additional parameters have to be taken into account:
- multiples
Prospect may have the same prospect number but they do have an unique primary key in the database. Consequently, the duplicate filter cannot rely on the Prospect equality
- a
Prospect has a visited status which defined whether the Prospect has been contacted by the company or not. The two main status are NEW and MET. Only one Prospect can be MET. Other duplicates (with the same prospect number) can only be NEW
Algorithm
The problem needs to an additional step to be solved:
- The prospects need to be grouped by
prospect number. At this stage, we will a <ProspectNumber, List<Prospect>> mapping. However, the List<Prospect> must end up with a single element according to the rules defined earlier
- Within a list, then if the prospect has not be met AND another prospect is found with a met status, then the first prospect is to be discarded
Consequently, the list will be generated with the following rules:
- If a prospect has no duplicate in terms of prospect numbers, it is kept regardless its status
- If a prospect has duplicate in terms of prospect number, only the
met one is kept
- If multiple prospects have the same prospect number but no one is
met, then an arbitrary one is met: Stream does not guarantee to loop in the list order.
Code
The trick is to go through a Map as the key will hold the unicity. If your propect number is a specific type, this will assume that equals() and hashCode() are properly defined.
disclaimer: code is untested
List<Prospection> all = prospectionRepository.findAll().stream()
// we instantiate here a Map<ProspectNumber, Prospect>
// There is no need to have a Map<ProspectNumber, List<Propect>>
// as the merge function will do the sorting for us
.collect(Collectors.toMap(
// Key: use the prospect number
prospect -> prospect.getProspectNumber(),
// Value: use the propect object itself
prospect -> prospect,
// Merge function: two prospects with the same prospect number
// are found: keep the one with the MET status or the first one
(oldProspect, newProspect) -> {
if(oldProspect.getStatus() == MET){
return oldProspect;
} else if (newProspect.getStatus() == MET){
return newProspect;
} else{
// return the first one, arbitrary decision
return oldProspect;
}
}
))
// get map values only
.values()
// stream it in order to collect its as a List
.stream()
.collect(Collectors.toList());
prospectionRepository.save(all);
Map.values() actually return a Collection. So if your prospectionRepository.save(...) can accept a Collection (not only List), you can go faster. I also use the following synonym:
- static method reference:
Prospect::getProspectNumber is the Function equivalent to prospect -> prospect.getProspectNumber()
Function.identity(): is equivalent to prospect -> prospect
- Ternary operator: it returns the same thing but written differently
Collection<Prospection> all = prospectionRepository.findAll().stream()
.collect(Collectors.toMap(
Prospect::getProspectNumber,
Function.identity(),
(oldProspect, newProspect) -> newProspect.getStatus() == MET ? newProspect : oldProspect
)).values();
prospectionRepository.save(all);
For your information, if two Prospection having the same ProspectNumber are equals, then a simple distinct() would have been enough:
List<Prospection> all = prospectionRepository.findAll()
.stream()
.distinct()
.collect(Collectors.toList());
prospectionRepository.save(all);