Scala nested for loop over Streams

Question

I've written the following Scala code to compute a distance matrix:

def dist(fasta: Stream[FastaRecord], f: (FastaRecord, FastaRecord) => Int) = {
  val inF = fasta.par
  for (i <- inF; j <- inF)
   yield (f(i, j))
}

This code works great in the sense that I get excellent parallelism. Unfortunately, I'm doing twice as much work as I need to as f(i, j) is the same as f(j, i). What I want to do is start j at i+1 in the stream. I can do this with indices:

for (i <- 0 until inF.length - 1; j <- i+1 until inF.length) 
  yield(f(inF(i), inF(j)))

However, asking for inF.length I've heard is not good on a Stream and this doesn't give me the parallelism.

I think there should be a way to do this iteration, however, I haven't come up with anything yet.

thanks! jim

Streams cache their results, like in this example of fibonacci calculations. So after the first time j traverses the whole stream it should be as quick as a normal list. Meaning I think you're better off evaluating your stream with a length in the beginning, and then doing only half the number of parallel calculations with your f function. I'm commenting this just so you understand the performance of Streams after they have been iterated over once. — Akos Krivachy
– Akos Krivachy, Commented Nov 24, 2013 at 21:33
I think the bigger problem here is that Streams aren't optimized for random access, so the inF(i) and inF(j) operations will be slow. — DaoWen
– DaoWen, Commented Nov 25, 2013 at 0:06

DaoWen · Accepted Answer · 2013-11-25 01:02:48Z

1

I think using zipWithIndex might get you what you're looking for:

def dist(fasta: Stream[FastaRecord], f: (FastaRecord, FastaRecord) => Int) = {
  val inF = fasta.zipWithIndex.par
  for ((x, i) <- inF; (y, j) <- inF; if i <= j)
   yield f(x, y)
}

By filtering i <= j you can eliminate the repeated (mirrored) cases. However, I do get a warning when I compile this:

warning: `withFilter' method does not yet exist on scala.collection.parallel.immutable.ParSeq[(FastaRecord, Int)], using `filter' method instead

I don't think that would really be an issue, but I also don't know how to supress the error...

answered Nov 25, 2013 at 1:02

DaoWen

33.1k6 gold badges77 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scala nested for loop over Streams

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related