0

I execute a python script file in pyspark 1.6.2 (yes an old one for certification training reasons).

spark-submit --master yarn-cluster s01.py

when run it just returns "Application report for application_somelongnumber". What I was expecting, was that it shows the output of my script-command. So that I can check if I developed correctly. What should I do better to get what I want?

The content of my script:

#!/usr/bin/python

from pyspark.sql import Row
from pyspark.sql.functions import *
from pyspark import SparkContext
sc = SparkContext(appName = "solution01")

a = sc.textFile("/data/crime.csv")
b = a.take(1)
sc.stop()
print(b)

UPDATE : When I execute pyspark s01.py I see my results but that is not the intended behaviour, because I want it to be executed with parameters on the cluster.

1 Answer 1

1

1) Print statements will not work in yarn mode. Instead use foreach like this :

myRDD.collect().foreach(println)

2) You should use yarn-client mode instead of yarn-cluster while debugging, in which case the spark driver will be created on the machine from where you execute the spark-submit command.

3) When you are executing a spark command in yarn-cluster mode. The logs cannot be seen on console during execution. There is a URL generated with application id. You can check the logs at the given url.

Alternatively you can download the logs from the cluster to the local machine, once the execution is completed, using the command :

yarn logs -applicationId <application>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.