1

Is there a way to read a file as byte array in spark?

As of now I am using the below code, but the content of the file changing in byte level. Its an encrypted file, so looking for ways to read the file without any change in byte level. I see lot of question on the same topic, but none provide a satisfactory answer. So posting this question as it could help other also. Thanks

val rawFileRDDEncrypted = spark.sparkContext.textFile("path")
6
  • No it doesn't resolve my question. I am trying to read only one ecrypted file as byte array Commented Mar 3, 2021 at 8:38
  • does sc.binaryFiles() work? Commented Mar 3, 2021 at 8:38
  • sc.binaryFile is what refered in the link you shared above and it didnt work Commented Mar 3, 2021 at 8:44
  • why it didn't work? The docs says it will read as a byte array. Commented Mar 3, 2021 at 8:45
  • spark.sparkContext.binaryFiles("path") this is what I tried and I couldnt get the bytes out of it. If you have snippets can you share ... thanks Commented Mar 3, 2021 at 8:52

1 Answer 1

3

Made it work with this

val binaryFileList = spark.sparkContext.binaryFiles("file").collect()
val byteArray: Array[Array[Byte]] = binaryFileList.map(tuple=> {
  val pds = tuple._2
  val dis = pds.open()
  val len = dis.available();
  val buf = Array.ofDim[Byte](len)
  pds.open().readFully(buf)
  buf
})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.