I am looking for a way to implement a non-terminal grouping operation, such that the memory overhead will be minimal.
For example, consider distinct(). In the general case, it has no choice but to collect all distinct items, and only then stream them forward. However, if we know that the input stream is already sorted, the operation could be done "on-the-fly", using minimal memory.
I know I can achieve this for iterators using an iterator wrapper and implementing the grouping logic myself. Is there a simpler way to implement this using streams API instead?
--EDIT--
I found a way to abuse Stream.flatMap(..) to achieve this:
private static class DedupSeq implements IntFunction<IntStream> {
private Integer prev;
@Override
public IntStream apply(int value) {
IntStream res = (prev != null && value == prev)? IntStream.empty() : IntStream.of(value);
prev = value;
return res;
}
}
And then:
IntStream.of(1,1,3,3,3,4,4,5).flatMap(new DedupSeq()).forEach(System.out::println);
Which prints:
1
3
4
5
With some changes, the same technique can be used for any kind of memory-efficient sequence grouping of streams. Anyway, I don't like much this solution, and I was looking for something more natural (like the way mapping or filtering work for example). Furthermore, I'm breaking the contract here because the function supplied to flatMap(..) is stateful.
.filter(someSet::add), but have you tried and compared the performance of such a solution with a plaindistinct()? Also, you say "in the general case", but it may be that there is an optimized implementation in the event that theStreamisORDERED, precisely (or more accurately, its underlyingSpliterator).forEachOrdered()?DISTINCTandSORTED. But - looking at the jdk8 code - the IntStream implementation does not make use of either for.distinct(). Reference-based streams otoh seem to.