Scala 2.10 benchmark: generic methods from the collections are useless when performance is important?

Question

I have benchmarked several ways to fold a large array of primitives ("direct" and with iterators), and the results are disappointing. (Yes, I have done warmup, intermediate GC and many run passes, running JVM in server mode and scalac optimisations are enabled (and debugging info is disabled)).

I think code is too big to post here, so here is link: http://pastebin.com/18dWWBM4 The only method there that runs nearly as good as plain old imperative loop is this not-so-generic hand-written function:

@inline def array_foldl[@specialized A, @specialized B](init: B)(src: Array[A])(fun: (B, A) => B) = {
  var res = init
  var i = 0
  var len = src.length
  while (i < len) {
    res = fun(res, src(i))
    i += 1
  }
  res
}

Other visually nice methods are complete outsiders. Also, using iterator abstractions fails in all cases, with hand-written parody to the standart Iterator called SpecializedIterator being slightly faster. So what's the problem? Can it be improved somehow? Is there a way to make "fast" iterator, or there is a big problem in the principle itself?
Thanks for attention.

Rex is probably right, but fp techniques also suffer problems related to megamorphic call site optimization (or lack thereof). — Daniel C. Sobral
– Daniel C. Sobral, Commented Feb 12, 2013 at 2:31
@DanielC.Sobral OK, so I understand the boxing&specialization problem, but what's wrong with SpecializedIterator class in the example? The difference is from JIT being unable to inline code? In that case, can it be improved or using any kind of Iterators inherently sacrifices much speed? — Display Name
– Display Name, Commented Feb 12, 2013 at 3:15
I've edited my post to specifically address the iterator issue. — Rex Kerr
– Rex Kerr, Commented Feb 12, 2013 at 10:46

Rex Kerr · Accepted Answer · 2013-02-12 10:45:35Z

4

The problem is boxing. It takes a lot longer to create an object than to add two numbers, but if you use generic (non-specialized) folds, you have to create an object every time. The problem with just specializing everything is that you make the entire library 100x larger since you need every combination of two primitive parameters (including with non-primitives), plus the original no-type-parameter version. (100x because there are 8 primitives plus Unit plus AnyRef/non-specialized T.) This is untenable, and since there is no readily available alternate solution, the collections are presently unspecialized.

Also, specialization itself is relatively new and thus still has some deficits in its implementation. In particular, you seem to have hit one with SpecializedIterator: the function in foreach doesn't end up specialized (I collapsed the trait/object thing into a single class to make it easier to track down):

public class Main$SpecializedArrayIterator$mcJ$sp extends Main$SpecializedArrayIterator{
public final void foreach$mcJ$sp(scala.Function1);
  Code:
   0:   aload_0
   1:   invokevirtual   #39; //Method Main$SpecializedArrayIterator.hasNext:()Z
   4:   ifeq    24
   7:   aload_1
   8:   aload_0
   9:   invokevirtual   #14; //Method next$mcJ$sp:()J
   12:  invokestatic    #45; //Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long;
   15:  invokeinterface #51,  2; //InterfaceMethod scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;
   20:  pop
   21:  goto    0
   24:  return

See the box at line 12, followed by a call to un-specialized Function1? Oops. (The tuple (A, (A,A) => A) used in sum also messes up specialization.) An implementation like this is full speed:

class SpecializedArrayIterator[@specialized A](src: Array[A]) {
  var i = 0
  val l = src.length
  @inline final def hasNext: Boolean = i < l
  @inline final def next(): A = { val res = src(i); i += 1; res }
  @inline final def foldLeft[@specialized B](z: B)(op: (B, A) => B): B = {
    var result = z
    while (hasNext) result = op(result,next)
    result
  }
}

...
measure((new SpecializedArrayIterator[Long](test)).foldLeft(0L)(_ + _))
...

With results like so:

Launched 51298 times in 2000 milliseconds, ratio = 25.649    // New impl
Launched 51614 times in 2000 milliseconds, ratio = 25.807    // While loop

edited Feb 12, 2013 at 10:45

answered Feb 11, 2013 at 23:34

Rex Kerr

168k27 gold badges325 silver badges411 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Display Name Over a year ago

Can specialization be postponed to compilation of user code? Theoretically this must be easy as long as the library source code is available and then we can specialize only combinations that are really used. (something like C++ templates) Also, why is specialized Iterator so slow then?

Randall Schulz Over a year ago

@SargeBorsch: In my very limited understanding, something like this (but based on bytecode or its equivalent, not source) is what is done (automatically and transparently) in the C# / .NET / CLR virtual machine.

Randall Schulz Over a year ago

Secondly: Scala does specialize selectively. Most commonly (I believe) uses of Scala's @specialized take the form in which the specific types for which to specialize the class (per type parameter) are enumerated explicitly. I seem to recall that some of the Scala Standard Library classes are specialized.

Display Name Over a year ago

@RandallSchulz OK, but the question is solely about scala/JVM. Answer to second comment: yes, I know, but that is orthogonal to the problem, they're in this case just throwing away some types (which never will be specialized regardless of what user will write), but the problem of m^n code bloat remains (with smaller m, but with large n the entire approach is bad anyway, unless m equals to 1). Also, restricting specialization to some of primitive types is generally bad idea, because nobody knows what types will be needed in future.

Display Name Over a year ago

@RandallSchulz And yes, some scala library classes are specialized, it's very good that there are some FunctionN and ProductN traits among them, but all collections and their "extensions" still enforce boxing (the only exclusion I found so far is plain old Java Array).

|

Collectives™ on Stack Overflow

Scala 2.10 benchmark: generic methods from the collections are useless when performance is important?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related