2

I have a dataframe that I cannot .show(). Every time it gives the following error? Is it possible that there is a corrupted column?

Error:

Py4JJavaError: An error occurred while calling o426.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 381.0 failed 4 times, most recent failure: Lost task 0.3 in stage 381.0 (TID 19204, ddlps28.rsc.dwo.com, executor 99): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/pyspark/worker.py", line 177, in main

2
  • What is your spark version? Commented Dec 6, 2018 at 17:15
  • 1
    @AliAzG - I am using 2.2.0.cloudera1 . This dataframe will .show() prior to running a UDF (via Spark SQL) on the dataframe, but there are no errors. Commented Dec 6, 2018 at 20:04

1 Answer 1

3

Your error most likely isn't actually in the "show" operation. It's that .show is what triggers execution of your DAG. You said it works if you don't run your UDF, you probably just have a different error in that UDF. The log would probably be on the worker nodes, so try access through your Hadoop UI to get access to executor logs to see what really is breaking

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.