Consider the following Set benchmark:
import scala.collection.immutable._
object SetTest extends App {
def time[a](f: => a): (a,Double) = {
val start = System.nanoTime()
val result: a = f
val end = System.nanoTime()
(result, 1e-9*(end-start))
}
for (n <- List(1000000,10000000)) {
println("n = %d".format(n))
val (s2,t2) = time((Set() ++ (1 to n)).sum)
println("sum %d, time %g".format(s2,t2))
}
}
Compiling and running produces
tile:scalafab% scala SetTest
n = 1000000
sum 1784293664, time 0.982045
n = 10000000
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space
...
I.e., Scala is unable to represent a set of 10 million Ints on a machine with 8 GB of memory. Is this expected behavior? Is there some way to reduce the memory footprint?
@specializedto the rescue? Also could use fold to count and only use stream vs. forcing entire evaluation of1 to nsequence (required in the++), but that is different semantics in some cases (not here) :)