1

I am trying to run a pyspark script on EMR via console. To do that, I first tested the script locally, downloading a small sample csv from s3 to my computer and worked with spark-submit to write aggregations result back to a local folder. Now, I have to run the same script on EMR, using a cluster because I have to do it in a much larger scale.

So far, I have tried everything I could find in Stack Overflow and other forums, and can't get rid of the following error:

19/11/18 18:40:07 INFO RMProxy: Connecting to ResourceManager at ip-10-101-30-101.ec2.internal/10.101.30.101:8032
19/11/18 18:40:07 INFO Client: Requesting a new application from cluster with 3 NodeManagers
19/11/18 18:40:07 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
19/11/18 18:40:07 INFO Client: Will allocate AM container, with 12288 MB memory including 1117 MB overhead
19/11/18 18:40:07 INFO Client: Setting up container launch context for our AM
19/11/18 18:40:07 INFO Client: Setting up the launch environment for our AM container
19/11/18 18:40:07 INFO Client: Preparing resources for our AM container
19/11/18 18:40:08 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/11/18 18:40:09 INFO Client: Uploading resource file:/mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4/__spark_libs__4633570638919089381.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/__spark_libs__4633570638919089381.zip
19/11/18 18:40:10 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/hive-site.xml
19/11/18 18:40:11 INFO Client: Uploading resource s3a://cody-dev-bi-s3/temp/pyspark_job.py -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/pyspark_job.py
19/11/18 18:40:12 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/pyspark.zip
19/11/18 18:40:12 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/py4j-0.10.7-src.zip
19/11/18 18:40:12 INFO Client: Uploading resource file:/mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4/__spark_conf__2275605486560105863.zip -> hdfs://ip-10-101-30-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1574102290151_0001/__spark_conf__.zip
19/11/18 18:40:13 INFO SecurityManager: Changing view acls to: hadoop
19/11/18 18:40:13 INFO SecurityManager: Changing modify acls to: hadoop
19/11/18 18:40:13 INFO SecurityManager: Changing view acls groups to: 
19/11/18 18:40:13 INFO SecurityManager: Changing modify acls groups to: 
19/11/18 18:40:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
19/11/18 18:40:15 INFO Client: Submitting application application_1574102290151_0001 to ResourceManager
19/11/18 18:40:15 INFO YarnClientImpl: Submitted application application_1574102290151_0001
19/11/18 18:40:16 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:16 INFO Client: 
     client token: N/A
     diagnostics: AM container is launched, waiting for AM container to Register with RM
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1574102415115
     final status: UNDEFINED
     tracking URL: http://ip-10-101-30-101.ec2.internal:20888/proxy/application_1574102290151_0001/
     user: hadoop
19/11/18 18:40:17 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:18 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:19 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:20 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:21 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:22 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:23 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:24 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:25 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:26 INFO Client: Application report for application_1574102290151_0001 (state: ACCEPTED)
19/11/18 18:40:27 INFO Client: Application report for application_1574102290151_0001 (state: FAILED)
19/11/18 18:40:27 INFO Client: 
     client token: N/A
     diagnostics: Application application_1574102290151_0001 failed 2 times due to AM Container for appattempt_1574102290151_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1574102290151_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001 Then click on links to logs of each attempt.
. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1574102415115
     final status: FAILED
     tracking URL: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001
     user: hadoop
19/11/18 18:40:27 ERROR Client: Application diagnostics message: Application application_1574102290151_0001 failed 2 times due to AM Container for appattempt_1574102290151_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1574102290151_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-10-101-30-101.ec2.internal:8088/cluster/app/application_1574102290151_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1574102290151_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1148)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1525)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/18 18:40:27 INFO ShutdownHookManager: Shutdown hook called
19/11/18 18:40:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4eb32396-6d6c-43f7-bae3-8c32d7327548
19/11/18 18:40:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-c251bf55-4c00-485a-8947-617394cc3bb4
Command exiting with ret '1'

I am probably messing up some setting in the console, as I tested the script locally and it works. I think this is the screen where I am doing something wrong:

enter image description here

0

2 Answers 2

2

You can check the log files having detailed exception, why your code is failing. For log file location, in EMR console click on your cluster -> click on Summary tab -> in Configuration details section check the Log URI: value. Now go to that Log URI: location on S3 and follow below path:

<log_uri_location>/<cluster_id>/containers/application_<some_random_number>

In above location you will found stdout.gz and stderr.gz, both files can help you get the exact exception.

Sign up to request clarification or add additional context in comments.

Comments

1

I seem to have solved my own problem by adding the following configuration to "edit software settings":

[{"configurations":[{"classification":"export","properties":{"PYSPARK_PYTHON":"/usr/bin/python3"}}],"classification":"spark-env","properties":{}}]

3 Comments

Thank you for this solution that saved me… Could you provide some explanations on what was the problem, how did you found this solution, and why this configuration is actually fixing the problem?
Hi Krystof, thanks for the comment. The problem was that the cluster was missing configuration of environment variables for pyspark, and the way to solve that is to either add this to a json script and attach it to the cluster when spinning it, or writing it in json syntax directly in the edit software settings section.
May you please tell what did you put exactly in your pyspark script? Did you include any imports? thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.