25

I have multiple iterators which return items in a sorted manner according to some sorting criterion. Now, I would like to merge (multiplex) the iterators into one, combined iterator. I know how to do it in Java style, with e.g. tree-map, but I was wondering if there is a more functional approach? I want to preserve the laziness of the iterators as much as possible.

1

3 Answers 3

43

You can just do:

val it = iter1 ++ iter2

It creates another iterator and does not evaluate the elements, but wraps the two existing iterators. It is fully lazy, so you are not supposed to use iter1 or iter2 once you do this.

In general, if you have more iterators to merge, you can use folding:

val iterators: Seq[Iterator[T]] = ???
val it = iterators.foldLeft(Iterator[T]())(_ ++ _)

If you have some ordering on the elements that you would like to maintain in the resulting iterator but you want lazyness, you can convert them to streams:

def merge[T: Ordering](iter1: Iterator[T], iter2: Iterator[T]): Iterator[T] = {
  val s1 = iter1.toStream
  val s2 = iter2.toStream

  def mergeStreams(s1: Stream[T], s2: Stream[T]): Stream[T] = {
    if (s1.isEmpty) s2
    else if (s2.isEmpty) s1
    else if (s1.head < s2.head) s1.head #:: mergeStreams(s1.tail, s2)
    else s2.head #:: mergeStreams(s1, s2.tail)
  }

  mergeStreams(s1, s2).iterator
}

Not necessarily faster though, you should microbenchmark this.

A possible alternative is to use buffered iterators to achieve the same effect.

Sign up to request clarification or add additional context in comments.

12 Comments

OK, how do I make sure that the relative ordering according to the same sorting criteria is maintained? Let's say I have an object that has a timestamp in a form of DateTime. I would like these two iterators to be merged accroding to timestamps, not one after the other (in Java I would use comparator)
Thanks, but I defeinitely dont want to use streams as they cache the elements. Besides, can I provide Ordering on the actual elements e.g. like in Java Comparator that you can pass to collections as an argument?
This was done in the example by using the Ordering context bound on T. The < comes from an implicit conversion to a value class, added in 2.10., but you could still use the compare method of the Ordering otherwise.
The memory footprint complexity and the asymptotic running time complexity stay the same with streams - it's only the absolute performance that could be worse.
"The memory footprint complexity and the asymptotic running time complexity stay the same with streams" - stackoverflow.com/questions/5159000/… this begs to differ in terms of memory complexity
|
4

Like @axel22 mentioned, you can do this with BufferedIterators. Here's one Stream-free solution:

def combine[T](rawIterators: List[Iterator[T]])(implicit cmp: Ordering[T]): Iterator[T] = {
  new Iterator[T] {
    private val iterators: List[BufferedIterator[T]] = rawIterators.map(_.buffered)

    def hasNext: Boolean = iterators.exists(_.hasNext)

    def next(): T = if (hasNext) {
      iterators.filter(_.hasNext).map(x => (x.head, x)).minBy(_._1)(cmp)._2.next()
    } else {
      throw new UnsupportedOperationException("Cannot call next on an exhausted iterator!")
    }
}

Comments

3

You could try:

(iterA ++ iterB).toStream.sorted.toIterator

For example:

val i1 = (1 to 100 by 3).toIterator
val i2 = (2 to 100 by 3).toIterator
val i3 = (3 to 100 by 3).toIterator

val merged = (i1 ++ i2 ++ i3).toStream.sorted.toIterator

merged.next  // results in: 1
merged.next  // results in: 2
merged.next  // results in: 3

1 Comment

Oops, my bad. I see you don't want to use Streams.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.