4

I realize this question has been asked before, but I think my failure is due to a different reason.

            List<Tuple2<String, Integer>> results = results.collect();
            for (int i=0; i<results.size(); i++) {
                System.out.println(results.get(0)._1);
            }


Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: tools.MAStreamProcessor$1 at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at 

I have a simple 'map/reduce' program in Spark. The above lines take the results of the reduce step and loop through each resultant element. If I comment them out, then I get no errors. I stayed away from using 'forEach' or the concise for () thinking that the underlying generated produce elements that aren't serializable. I've gotten it down to a simple for loop and so wonder why I am still running into this error.

Thanks, Ranjit

1 Answer 1

8

Use the -Dsun.io.serialization.extendedDebugInfo=true flag to turn on serialization debug logging. It will tell you what exactly it's unable to serialize.

The answer will have nothing to do with the lines you pasted. The collect is not the source of the problem, it's just what triggers the computation of the RDD. If you don't compute the RDD, nothing gets sent to the executors. So the accidental inclusion of something unserializable in an earlier step causes no problems without collect.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the tip. I did get more debug information but it didn't exactly point at the Object that wasn't serializable. I was seeing information listed like '-Object blah, -Field blah, -Object blah, etc'. Ultimately it turned out the culprit was a JSONObject instantiated inside the lambda function. When I moved it into a static function and called that function instead to get my JSON processing, it resolved the Serialization error. Thanks much for you help!
for a overall understanding of Spark serialization stackoverflow.com/questions/40818001/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.