ExitCodeException exitCode=13 when running PySpark via EMR console

Question

I am trying to run a pyspark script on EMR via console. To do that, I first tested the script locally, downloading a small sample csv from s3 to my computer and worked with spark-submit to write aggregations result back to a local folder. Now, I have to run the same script on EMR, using a cluster because I have to do it in a much larger scale.

So far, I have tried everything I could find in Stack Overflow and other forums, and can't get rid of the following error:

19/11/18 18:40:07 INFO RMProxy: Connecting to ResourceManager at ip-10-101-30-101.ec2.internal/10.101.30.101:8032
19/11/18 18:40:07 INFO Client: Requesting a new application from cluster with 3 NodeManagers
19/11/18 18:40:07 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
19/11/18 18:40:07 INFO Client: Will allocate AM container, with 12288 MB memory including 1117 MB overhead
19/11/18 18:40:07 INFO Client: Setting up container launch context for our AM
19/11/18 18:40:07 INFO Client: Setting up the launch environment for our AM container
19/11/18 18:40:07 INFO Client: Preparing resources for our AM container
19/11/18 18:40:08 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/11/18 18:40:09 INFO Client: Uploading resource file:/mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4/__spark_libs__4633570638919089381.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/__spark_libs__4633570638919089381.zip
19/11/18 18:40:10 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/hive-site.xml
19/11/18 18:40:11 INFO Client: Uploading resource s3a://cody-dev-bi-s3/temp/pyspark_job.py -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/pyspark_job.py
19/11/18 18:40:12 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/pyspark.zip
19/11/18 18:40:12 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/py4j-0.10.7-src.zip
19/11/18 18:40:12 INFO Client: Uploading resource file:/mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4/__spark_conf__2275605486560105863.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/__spark_conf__.zip
19/11/18 18:40:13 INFO SecurityManager: Changing view acls to: hadoop
19/11/18 18:40:13 INFO SecurityManager: Changing modify acls to: hadoop
19/11/18 18:40:13 INFO SecurityManager: Changing view acls groups to: 
19/11/18 18:40:13 INFO SecurityManager: Changing modify acls groups to: 
19/11/18 18:40:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
19/11/18 18:40:15 INFO Client: Submitting application application_1574102290151_0001 to ResourceManager
19/11/18 18:40:15 INFO YarnClientImpl: Submitted application application_1574102290151_0001
19/11/18 18:40:16 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:16 INFO Client: 
     client token: N/A
     diagnostics: AM container is launched, waiting for AM container to Register with RM
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1574102415115
     final status: UNDEFINED
     tracking URL: http://ip-10-101-30-101.ec2.internal:20888/proxy/application_1574102290151_0001/
     user: hadoop
19/11/18 18:40:17 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:18 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:19 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:20 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:21 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:22 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:23 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:24 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:25 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:26 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:27 INFO Client: Application report for application_1574102290151_0001 (state: FAILED)
19/11/18 18:40:27 INFO Client: 
     client token: N/A
     diagnostics: Application application_1574102290151_0001 failed 2 times due to AM Container for appattempt_1574102290151_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1574102290151_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001 Then click on links to logs of each attempt.
. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1574102415115
     final status: FAILED
     tracking URL: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001
     user: hadoop
19/11/18 18:40:27 ERROR Client: Application diagnostics message: Application application_1574102290151_0001 failed 2 times due to AM Container for appattempt_1574102290151_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1574102290151_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1574102290151_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1148)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1525)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/18 18:40:27 INFO ShutdownHookManager: Shutdown hook called
19/11/18 18:40:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4eb32396-6d6c-43f7-bae3-8c32d7327548
19/11/18 18:40:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4
Command exiting with ret '1'

I am probably messing up some setting in the console, as I tested the script locally and it works. I think this is the screen where I am doing something wrong:

Shams · Accepted Answer · 2020-01-22 03:59:57Z

2

You can check the log files having detailed exception, why your code is failing. For log file location, in EMR console click on your cluster -> click on Summary tab -> in Configuration details section check the Log URI: value. Now go to that Log URI: location on S3 and follow below path:

<log_uri_location>/<cluster_id>/containers/application_<some_random_number>

In above location you will found stdout.gz and stderr.gz, both files can help you get the exact exception.

answered Jan 22, 2020 at 3:59

Shams

3,6875 gold badges34 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RafaJM · Accepted Answer · 2019-11-18 21:13:58Z

1

I seem to have solved my own problem by adding the following configuration to "edit software settings":

[{"configurations":[{"classification":"export","properties":{"PYSPARK_PYTHON":"/usr/bin/python3"}}],"classification":"spark-env","properties":{}}]

answered Nov 18, 2019 at 21:13

RafaJM

4876 silver badges18 bronze badges

3 Comments

Krystof Over a year ago

Thank you for this solution that saved me… Could you provide some explanations on what was the problem, how did you found this solution, and why this configuration is actually fixing the problem?

RafaJM Over a year ago

Hi Krystof, thanks for the comment. The problem was that the cluster was missing configuration of environment variables for pyspark, and the way to solve that is to either add this to a json script and attach it to the cluster when spinning it, or writing it in json syntax directly in the edit software settings section.

Haha Over a year ago

May you please tell what did you put exactly in your pyspark script? Did you include any imports? thanks

Collectives™ on Stack Overflow

ExitCodeException exitCode=13 when running PySpark via EMR console

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related