Scala iterators are confusing

Question

Click here to see the problem statement image

I tried very hard to understand why iterators are behaving like that. I mean after performing once

result = lines.filter(_.nonEmpty).map(_.toInt)

the iterator buffer is over written with all elemnets except the last element.

I mean if I have 5 elements in my input text file after giving 5 times

result = lines.filter(_.nonEmpty).map(_.toInt)

my iterator is becoming empty.

Any help is much appreciated.... Thanks in advance

@victor-moroz I see now you were demonstrating the gotcha. So the answer is, "Yes, iterators are very confusing when you misuse them." We have notions like "fail-fast" to say, "If I misuse you, please blow up in a way I can debug easily." — som-snytt
– som-snytt, Commented Oct 29, 2016 at 20:53

som-snytt · Accepted Answer · 2016-10-29 05:56:22Z

5

The doc is very clear that you must discard an iterator after invoking any method except next and hasNext.

http://www.scala-lang.org/api/2.11.8/#scala.collection.Iterator

answered Oct 29, 2016 at 5:56

som-snytt

39.6k2 gold badges49 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim · Accepted Answer · 2016-10-29 22:41:28Z

0

som-snytt is right here, but didn't explain what exactly was going on.

When you transform an iterator, you need to save the result of the transformation and only use that. In particular, calling filter on an iterator internally buffers it, which calls next on the original iterator and saves it in a head variable. If you call next on the buffered thing, you get 4. If you call next on the original iterator, you get 8: your first element is gone. If you'd instead written:

var result = lines.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)

You could repeat the last line as many times as you want without the iterator becoming empty, because you're always operating on the transformed iterator.

EDIT: to address the buffering comment -- here's the code for Iterator.filter:

def filter(p: A => Boolean): Iterator[A] = new AbstractIterator[A] {
  private var hd: A = _
  private var hdDefined: Boolean = false

  def hasNext: Boolean = hdDefined || {
    do {
      if (!self.hasNext) return false
      hd = self.next()
    } while (!p(hd))
    hdDefined = true
    true
  }

  def next() = if (hasNext) { hdDefined = false; hd } else empty.next()
}

The hd and hdDefined variables perform exactly the same buffering that is used in Iterator.buffered.

edited Oct 29, 2016 at 22:41

answered Oct 29, 2016 at 21:57

Tim

3,74515 silver badges25 bronze badges

4 Comments

Victor Moroz Over a year ago

.filter doesn't buffer anything when you call it on iterator, but when you call hasNext on .filter it will destroy some (all) elements from original iterator. Which happens if you do it in REPL since REPL calls hasNext. I had an answer explaining how it actually works, but such explanation is irrelevant. Just don't do it.

Victor Moroz Over a year ago

Iterator.filter code directly contradicts your statement though. Calling .filter simply returns a new object, buffering happens in hasNext and next.

som-snytt Over a year ago

Not sure what the code snippet means to demonstrate, but it's worth saying that particular methods on particular iterators may/may not be destructive, but those are implementation details, and you must never rely on them.

Tim Over a year ago

Yes, you're absolutely right. I tried to provide an implementation detail to explain why you can't rely on the original, but I probably just misdirected attention from the important point (don't reuse base iterators after transformations). Sorry!

Collectives™ on Stack Overflow

Scala iterators are confusing

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related