1

Click here to see the problem statement image

I tried very hard to understand why iterators are behaving like that. I mean after performing once

result = lines.filter(_.nonEmpty).map(_.toInt)

the iterator buffer is over written with all elemnets except the last element.

I mean if I have 5 elements in my input text file after giving 5 times

result = lines.filter(_.nonEmpty).map(_.toInt)

my iterator is becoming empty.

Any help is much appreciated.... Thanks in advance

1
  • @victor-moroz I see now you were demonstrating the gotcha. So the answer is, "Yes, iterators are very confusing when you misuse them." We have notions like "fail-fast" to say, "If I misuse you, please blow up in a way I can debug easily." Commented Oct 29, 2016 at 20:53

2 Answers 2

5

The doc is very clear that you must discard an iterator after invoking any method except next and hasNext.

http://www.scala-lang.org/api/2.11.8/#scala.collection.Iterator

Sign up to request clarification or add additional context in comments.

Comments

0

som-snytt is right here, but didn't explain what exactly was going on.

When you transform an iterator, you need to save the result of the transformation and only use that. In particular, calling filter on an iterator internally buffers it, which calls next on the original iterator and saves it in a head variable. If you call next on the buffered thing, you get 4. If you call next on the original iterator, you get 8: your first element is gone. If you'd instead written:

var result = lines.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)

You could repeat the last line as many times as you want without the iterator becoming empty, because you're always operating on the transformed iterator.

EDIT: to address the buffering comment -- here's the code for Iterator.filter:

def filter(p: A => Boolean): Iterator[A] = new AbstractIterator[A] {
  private var hd: A = _
  private var hdDefined: Boolean = false

  def hasNext: Boolean = hdDefined || {
    do {
      if (!self.hasNext) return false
      hd = self.next()
    } while (!p(hd))
    hdDefined = true
    true
  }

  def next() = if (hasNext) { hdDefined = false; hd } else empty.next()
}

The hd and hdDefined variables perform exactly the same buffering that is used in Iterator.buffered.

4 Comments

.filter doesn't buffer anything when you call it on iterator, but when you call hasNext on .filter it will destroy some (all) elements from original iterator. Which happens if you do it in REPL since REPL calls hasNext. I had an answer explaining how it actually works, but such explanation is irrelevant. Just don't do it.
Iterator.filter code directly contradicts your statement though. Calling .filter simply returns a new object, buffering happens in hasNext and next.
Not sure what the code snippet means to demonstrate, but it's worth saying that particular methods on particular iterators may/may not be destructive, but those are implementation details, and you must never rely on them.
Yes, you're absolutely right. I tried to provide an implementation detail to explain why you can't rely on the original, but I probably just misdirected attention from the important point (don't reuse base iterators after transformations). Sorry!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.