i am new to apache spark and i was trying to run a test application using spark. The problem which i'm facing is that when i create a RDD using the collection of data i want to process, it gets created but it spark doesn't start processing it unless and until i call the .collect method present in the RDD class. In this way i have to wait for spark to process the RDD. Is there some way that spark automatically processes the collection as soon as i form the RDD and then i can call the .collect method to get the processed data any time without have to wait for spark?
Moreover is there any way i can use spark to put the processed data into a database instead of returning it to me?
The code that i'm using is given below:
object appMain extends App {
val spark = new SparkContext("local", "SparkTest")
val list = List(1,2,3,4,5)
// i want this rdd to be processed as soon as it is created
val rdd = spark.parallelize(list.toSeq, 1).map{ i =>
i%2 // checking if the number is even or odd
}
// some more functionality here
// the job above starts when the line below is executed
val result = rdd.collect
result.foreach(println)
}
actionsuch ascollectorsaveAsTextFile- see the docs.