45

With Scala, what is the best way to read from an InputStream to a bytearray?

I can see that you can convert an InputStream to char array

Source.fromInputStream(is).toArray()

12 Answers 12

49

How about:

Stream.continually(is.read).takeWhile(_ != -1).map(_.toByte).toArray

Update: use LazyList instead of Stream (since Stream is deprecated in Scala 3)

LazyList.continually(is.read).takeWhile(_ != -1).map(_.toByte).toArray
Sign up to request clarification or add additional context in comments.

7 Comments

Could you explain the difference between this and the variant in the question?
@Jus12 I was looking for a byte array. What I have in the question, is a way to obtain the char array.
Won't that create a huge linked list, then convert it to an array? That doesn't look very efficient, in time or memory.
It looks like this does not create a linked list after all. Stream.continually produces an iterator, and takeWhile and map seem to convert iterators to iterators. E.g. evaluating Array(1, 2, 3, 4, -1).iterator.takeWhile(-1 !=).map(_.toByte) in a Scala 2.9.3 REPL gives me Iterator[Byte] = non-empty iterator.
This seemed to cause OOM errors for me. Things were GC'd eventually but the spikes were beyond what my server could handle.
|
47

Just removed bottleneck in our server code by replacing

Stream.continually(request.getInputStream.read()).takeWhile(_ != -1).map(_.toByte).toArray

with

org.apache.commons.io.IOUtils.toByteArray(request.getInputStream)

Or in pure Scala:

def bytes(in: InputStream, initSize: Int = 8192): Array[Byte] = {
  var buf = new Array[Byte](initSize)
  val step = initSize
  var pos, n = 0
  while ({
    if (pos + step > buf.length) buf = util.Arrays.copyOf(buf, buf.length << 1)
    n = in.read(buf, pos, step)
    n != -1
  }) pos += n
  if (pos != buf.length) buf = util.Arrays.copyOf(buf, pos)
  buf
}

Do not forget to close an opened input stream in any case:

val in = request.getInputStream
try bytes(in) finally in.close()

4 Comments

That's org.apache.commons.io.IOUtils.toByteArray, in case anyone was wondering.
This definitely feels faster. Anyone done any benchmarks or tests with larger files?
Thank you. I had huge issues with GC Overhead errors running this with Apache Spark, where 90% of the time my tasks spent in GC. Replacing with toByteArray massively sped up things.
It's important to point out how this solution can really drastically out-perform the alternatives, where you have things like map(_.toByte) iterating over the input byte-by-byte... Do this you are working with big-data!
20

In a similar vein to Eastsun's answer... I started this as a comment, but it ended up getting just a bit to long!

I'd caution against using Stream, if holding a reference to the head element then streams can easily consume a lot of memory.

Given that you're only going to read in the file once, then Iterator is a much better choice:

def inputStreamToByteArray(is: InputStream): Array[Byte] =
  Iterator continually is.read takeWhile (-1 !=) map (_.toByte) toArray

Comments

14
import scala.tools.nsc.io.Streamable
Streamable.bytes(is)

Don't remember how recent that is: probably measured in days. Going back to 2.8, it's more like

new Streamable.Bytes { def inputStream() = is } toByteArray

4 Comments

Is it safe to use stuff from scala.tools packages? Are they even a part of the standard library?
No. But if you want to know how to write it, there it is.
It seems to have moved to the more standard scala.reflect.io package now.
scala.reflect.io.Streamable.bytes
11

With Scala IO, this should work:

def inputStreamToByteArray(is: InputStream): Array[Byte] = 
   Resource.fromInputStream(in).byteArray

Comments

7

With better-files, you can simply do is.bytes

1 Comment

better.files should just be in std lib. It is so much better. Also if you want Array[Byte] you need to use is.byteArray instead.
3

Source.fromInputStream(is).map(_.toByte).toArray

1 Comment

This fails on binary/false encoded text files: stackoverflow.com/questions/13327536/…
2

How about buffered version of solution based on streams plus ByteArraOutputStream to minimize boilerplate around final array growing?

val EOF: Int = -1

def readBytes(is: InputStream, bufferSize: Int): Array[Byte] = {
  val buf = Array.ofDim[Byte](bufferSize)
  val out = new ByteArrayOutputStream(bufferSize)

  Stream.continually(is.read(buf)) takeWhile { _ != EOF } foreach { n =>
    out.write(buf, 0, n)
  }

  out.toByteArray
}

Comments

1

Here's an approach using scalaz-stream:

import scalaz.concurrent.Task
import scalaz.stream._
import scodec.bits.ByteVector

def allBytesR(is: InputStream): Process[Task, ByteVector] =
  io.chunkR(is).evalMap(_(4096)).reduce(_ ++ _).lastOr(ByteVector.empty)

2 Comments

probably no reason to reduce, that would defeat the incremental nature of streams
The reason is that the question asks for a byte array.
1

Since JDK 9:

is.readAllBytes()

Comments

0

We can do using Google API ByteStreams

com.google.common.io.ByteStreams

pass the stream to ByteStreams.toByteArray method for conversion

ByteStreams.toByteArray(stream)

Comments

-1
def inputStreamToByteArray(is: InputStream): Array[Byte] = {
    val buf = ListBuffer[Byte]()
    var b = is.read()
    while (b != -1) {
        buf.append(b.byteValue)
        b = is.read()
    }
    buf.toArray
}

1 Comment

Does List[Byte] have a method "add"?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.