10

Are there pros/cons, or maybe different use cases for using spark-submit to submit a python script vs. simply running a .py file with the python executable (and importing SparkSession), like this?

from pyspark.sql import SparkSession
spk = SparkSession.builder.master(master).getOrCreate()

Basically, are there any differences running the script via python and not spark-submit.

2
  • Possible duplicate of What is the difference between spark-submit and pyspark? Commented Jun 1, 2017 at 15:59
  • pyspark runs inside a spark shell, yeah? in this case, i just want to run the script via python and not spark-submit. Commented Jun 1, 2017 at 16:03

1 Answer 1

9

spark-submit is mostly a convenience method. It allows you to set all desired configuration, environment variables, and other options on submit.

It also allows you to set JVM options, which cannot be set on the running virtual machine. Since JVM is initialized once Spark configuration is created, it is not possible to do the same from the running Python process.

Sign up to request clarification or add additional context in comments.

1 Comment

after running side-by-side, it also appears that with spark-submit, logging is more verbose by default, and spark-submit also handles cleanup chores, both on failure and success.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.