I realize this question has been asked before, but I think my failure is due to a different reason.
List<Tuple2<String, Integer>> results = results.collect();
for (int i=0; i<results.size(); i++) {
System.out.println(results.get(0)._1);
}
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: tools.MAStreamProcessor$1 at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at
I have a simple 'map/reduce' program in Spark. The above lines take the results of the reduce step and loop through each resultant element. If I comment them out, then I get no errors. I stayed away from using 'forEach' or the concise for () thinking that the underlying generated produce elements that aren't serializable. I've gotten it down to a simple for loop and so wonder why I am still running into this error.
Thanks, Ranjit