3

To iterate an Iterator, we can call its foreach or use while loop. The implementation of foreach is:

def foreach[U](f: A =>  U) { while (hasNext) f(next()) }

So I think foreach should as fast as while(iterator.hasNext), but after doing some test, the results surprise me very much.

my test code:

def getSize2[T](i: Iterator[T]) = {
  var count = 0
  val f = (a: T) => count += 1
  while(i.hasNext) {
    f(i.next)
  }
  count
}

def getSize3[T](i: Iterator[T]) = {
  var count = 0
  val f = (a: T) => count += 1
  i.foreach(f)
  count
}

It is very weird that getSize2 is 3 times faster than getSize3!

Anyone know what happened there?

Edit: paste my test program

def main(args: Array[String]) {
  val data = 0 to 100000000

  val start2 = System.nanoTime
  (0 to 100).foreach(_ => getSize2(data.iterator))
  println("get size, while loop, using function: " + (System.nanoTime - start2)/1000000)

  val start3 = System.nanoTime
  (0 to 100).foreach(_ => getSize3(data.iterator))
  println("get size, foreach: " + (System.nanoTime - start3)/1000000)

}

My OS: ubuntu 12.04, scala version: 2.10.3

4
  • 1
    Can you write how you measured the execution speed? I mean actual code and command line you executed. additionally scala version and which OS you are running. Commented May 16, 2014 at 6:13
  • 1
    Look at the bytecode and you'll see the foreach is doing more invocations. Commented May 16, 2014 at 12:58
  • 1
    That's interesting because I'm seeing opposite results. while (iter.hasNext) { count += 1; iter.next() } takes some 6 times longer. Commented May 16, 2014 at 17:21
  • An update: Checking this result with Scala 2.11.12, I get that the given test code runs the while loop about 30% faster using an underlying mutable.ArrayBuffer with a million integers. However, if one changes the "Iterable[T]" parameter to "Iterable[T]" (adding "val i = input.iterator" to the while loop) then the while loop runs about the same speed, but the "foreach" runs over ten times faster here. Commented Jan 3, 2019 at 0:44

1 Answer 1

4

The while loop is faster because a function call is not free and cannot always be removed by the JIT compiler. In particular, var count is wrapped in an anonymous object so it can get access to it from within the function object, and to really speed things up the JIT compiler needs to unwrap everything and then finally realize that it never needed the anonymous object at all.

Adding the extra layer of a function call out to the library foreach really complicates the JIT compiler's analysis (three layers of indirection instead of two, etc.).

Sign up to request clarification or add additional context in comments.

2 Comments

I also wrap var count in an anonymous function object in the while loop version. Do you mean JIT is more likely to remove function call in the while loop version rather than the foreach version?
@cloud - Yes, that's what I mean. There's less work to do, which often makes a difference. (Edited the answer to be clearer about that.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.