1

I am new to Scala and Apache Spark and have been trying out some online examples.

I am using scala.collection.mutable.ArrayBuffer to store a list of tuples of the form (Int,Array[String]). I am creating an ArrayBuffer and then parsing a text file line by line and appending the required data from each line to the ArrayBuffer.

The code has no compilation errors. But when I access ArrayBuffer outside the block where I am appending it, I am not able to get the contents and the ArrayBuffer is always empty.

My code is below -

val conf = new SparkConf().setAppName("second")
val spark = new SparkContext(conf)

val file = spark.textFile("\\Desktop\\demo.txt")
var list = scala.collection.mutable.ArrayBuffer[(Int, Array[String])]()
var count = 0

file.map(_.split(","))
.foreach { a =>
count = countByValue(a) // returns an Int
println("count is " + count) // showing correct output "count is 3"
var t = (count, a)
println("t is " + t) // showing correct output "t is (3,[Ljava.lang.String;@539f0af)"
list += t
}

println("list count is = " + list.length) // output "list count is = 0"
list.foreach(println) // no output

Can someone point out why this code isn't working.

Any help is greatly appreciated.

1 Answer 1

2

I assume spark is a SparkContext. In this case this is not surprising that the local list is not updated, only its copy sent to spark as a closure. In case you need a mutable value within the foreach, you should use an Accumulator.

Sign up to request clarification or add additional context in comments.

5 Comments

I should add that the ArrayBuffer's += is not associative, so you need to modify your requirements or use something else.
Thank you. spark is a SparkContext. I have edited the code to make it more clear. I will try to use a custom Accumulator for the ArrayBuffer. Can you suggest any other alternative to += ?
No, in a reduce (accumulation) phase the order of reduce is not defined, so the values could be in any order. Probably if you zipWithIndex the original RDD and use that info as an additional tuple value you can resort the output from the accumulator to keep the original order.
(++ is associative for ArrayBuffers, so probably with this is doable after all. Sorry, it is quite late here.)
Since i do not want the values to be in order, I created a custom Accumulator for the ArrayBuffer and I used the ++ for ArrayBuffer in the addInPlace method of the accumulator. I am now able to get the contents of the ArrayBuffer after the foreach block.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.