I've written the following Scala code to compute a distance matrix:
def dist(fasta: Stream[FastaRecord], f: (FastaRecord, FastaRecord) => Int) = {
val inF = fasta.par
for (i <- inF; j <- inF)
yield (f(i, j))
}
This code works great in the sense that I get excellent parallelism. Unfortunately, I'm doing twice as much work as I need to as f(i, j) is the same as f(j, i). What I want to do is start j at i+1 in the stream. I can do this with indices:
for (i <- 0 until inF.length - 1; j <- i+1 until inF.length)
yield(f(inF(i), inF(j)))
However, asking for inF.length I've heard is not good on a Stream and this doesn't give me the parallelism.
I think there should be a way to do this iteration, however, I haven't come up with anything yet.
thanks! jim
jtraverses the whole stream it should be as quick as a normal list. Meaning I think you're better off evaluating your stream with alengthin the beginning, and then doing only half the number of parallel calculations with yourffunction. I'm commenting this just so you understand the performance of Streams after they have been iterated over once.inF(i)andinF(j)operations will be slow.